ACM Home Page
Please provide us with feedback. Feedback
Percentile optimization in uncertain Markov decision processes with application to efficient exploration
Full text PdfPdf (303 KB)
Source ICML; Vol. 227 archive
Proceedings of the 24th international conference on Machine learning table of contents
Corvalis, Oregon
Pages: 225 - 232  
Year of Publication: 2007
ISBN:978-1-59593-793-3
Authors
Erick Delage  Stanford University, Stanford, California
Shie Mannor  McGill University, Montreal, Quebec, Canada
Sponsor
: Machine Learning Journal
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 28,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1273496.1273525
What is a DOI?

ABSTRACT

Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Calafiore, G., & El Ghaoui, L. (2006). On distributionally robust chance-constrained linear programs. Optimization Theory and Applications, 130, 1--22.
 
4
Charnes, A., & Cooper, W. (1959). Chance constrained programming. Management Science, 6, 73--79.
 
5
Dearden, R., Friedman, N., & Andre, D. (1999). Model-based Bayesian exploration. Proc. of Uncertainty in AI (pp. 150--159).
 
6
Filar, J., Krass, D., & Ross, K. (1995). Percentile performance criteria for limiting average Markov control problems. IEEE Trans. on Automatic Control, 40, 2--10.
 
7
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2003). Bayesian data analysis, second edition. Chapman & Hall/CRC.
 
8
 
9
Howard, R. (1966). Information value theory. IEEE Trans. on Systems Science and Cybernetics, SSC-2, 22--26.
 
10
 
11
 
12
Lobo, M., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second order cone programming. Linear Algebra and its App., 284, 193--228.
 
13
 
14
Nemirovski, A., & Shapiro, A. (2006). Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17, 969--996.
 
15
 
16
Prékopa, A. (1995). Stochastic programming. Kluwer Academic Publishers.
 
17
 
18
Silver, E. (1963). Markovian decision processes with uncertain transition probabilities or rewards (Technical Report 1). Operations Research Center, MIT.
19
Collaborative Colleagues:
Erick Delage: colleagues
Shie Mannor: colleagues