ACM Home Page
Please provide us with feedback. Feedback
A theoretical analysis of Model-Based Interval Estimation
Full text PdfPdf (865 KB)
Source ACM International Conference Proceeding Series; Vol. 119 archive
Proceedings of the 22nd international conference on Machine learning table of contents
Bonn, Germany
Pages: 856 - 863  
Year of Publication: 2005
ISBN:1-59593-180-5
Authors
Alexander L. Strehl  Rutgers University, Piscataway, NJ
Michael L. Littman  Rutgers University, Piscataway, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 17,   Citation Count: 10
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1102351.1102459
What is a DOI?

ABSTRACT

Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less "online" cousins from the literature.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Fong, P. W. L. (1995). A quantitative study of hypothesis selection. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 226--234).
 
4
 
5
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
 
6
 
7
 
8
 
9
Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation: Proofs. Forthcoming tech report, Rutgers University.
 
10
 
11
Voltaire (1759). Candide.
 
12
Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., & Weinberger, M. J. (2003). Inequalities for the L1 deviation of the empirical distribution (Technical Report HPL-2003-97R1). Hewlett-Packard Labs.
 
13

CITED BY  10
Collaborative Colleagues:
Alexander L. Strehl: colleagues
Michael L. Littman: colleagues