ACM Home Page
Please provide us with feedback. Feedback
PAC model-free reinforcement learning
Full text PdfPdf (247 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 881 - 888  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Alexander L. Strehl  Rutgers University, Piscataway, NJ
Lihong Li  Rutgers University, Piscataway, NJ
Eric Wiewiora  University of California, San Diego
John Langford  TTI-Chicago, Chicago, IL
Michael L. Littman  Rutgers University, Piscataway, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 99,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143955
What is a DOI?

ABSTRACT

For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
 
5
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
 
6
 
7
8
 
9
 
10


Collaborative Colleagues:
Alexander L. Strehl: colleagues
Lihong Li: colleagues
Eric Wiewiora: colleagues
John Langford: colleagues
Michael L. Littman: colleagues