|
ABSTRACT
Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Calafiore, G., & El Ghaoui, L. (2006). On distributionally robust chance-constrained linear programs. Optimization Theory and Applications, 130, 1--22.
|
| |
4
|
Charnes, A., & Cooper, W. (1959). Chance constrained programming. Management Science, 6, 73--79.
|
| |
5
|
Dearden, R., Friedman, N., & Andre, D. (1999). Model-based Bayesian exploration. Proc. of Uncertainty in AI (pp. 150--159).
|
| |
6
|
Filar, J., Krass, D., & Ross, K. (1995). Percentile performance criteria for limiting average Markov control problems. IEEE Trans. on Automatic Control, 40, 2--10.
|
| |
7
|
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2003). Bayesian data analysis, second edition. Chapman & Hall/CRC.
|
| |
8
|
|
| |
9
|
Howard, R. (1966). Information value theory. IEEE Trans. on Systems Science and Cybernetics, SSC-2, 22--26.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Lobo, M., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second order cone programming. Linear Algebra and its App., 284, 193--228.
|
| |
13
|
|
| |
14
|
Nemirovski, A., & Shapiro, A. (2006). Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17, 969--996.
|
| |
15
|
|
| |
16
|
Prékopa, A. (1995). Stochastic programming. Kluwer Academic Publishers.
|
| |
17
|
|
| |
18
|
Silver, E. (1963). Markovian decision processes with uncertain transition probabilities or rewards (Technical Report 1). Operations Research Center, MIT.
|
 |
19
|
|
|