| A theoretical analysis of Model-Based Interval Estimation |
| Full text |
Pdf
(865 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 119
archive
Proceedings of the 22nd international conference on Machine learning
table of contents
Bonn, Germany
Pages: 856 - 863
Year of Publication: 2005
ISBN:1-59593-180-5
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 17, Citation Count: 10
|
|
|
ABSTRACT
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less "online" cousins from the literature.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Fong, P. W. L. (1995). A quantitative study of hypothesis selection. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 226--234).
|
| |
4
|
|
| |
5
|
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation: Proofs. Forthcoming tech report, Rutgers University.
|
| |
10
|
|
| |
11
|
Voltaire (1759). Candide.
|
| |
12
|
Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., & Weinberger, M. J. (2003). Inequalities for the L1 deviation of the empirical distribution (Technical Report HPL-2003-97R1). Hewlett-Packard Labs.
|
| |
13
|
|
CITED BY 10
|
|
Alexander L. Strehl , Lihong Li , Eric Wiewiora , John Langford , Michael L. Littman, PAC model-free reinforcement learning, Proceedings of the 23rd international conference on Machine learning, p.881-888, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|