ACM Home Page
Please provide us with feedback. Feedback
Online exploration in least-squares policy iteration
Full text PdfPdf (499 KB)
Source
International Conference on Autonomous Agents archive
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2 table of contents
Budapest, Hungary
SESSION: Multi-agent learning table of contents
Pages 733-739  
Year of Publication: 2009
ISBN:978-0-9817381-7-8
Authors
Lihong Li  Rutgers University, Piscataway, NJ
Michael L. Littman  Rutgers University, Piscataway, NJ
Christopher R. Mansley  Rutgers University, Piscataway, NJ
Sponsors
: The Foundation for Intelligent Physical Agents
Microsoft Research : Microsoft Research
: Whitestein Technologies
: European Office of Aerospace Research and Development, Air Force Office of Scientific Research, United States Air Force Research Laboratory
: Drexel University
: Wiley -- Blackwell Ltd
Publisher
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 33,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring large MDPs by integrating a powerful exploration technique, Rmax, into a state-of-the-art learning algorithm, least-squares policy iteration (LSPI). This approach combines the strengths of both methods, and has shown its effectiveness and superiority over LSPI with two other popular exploration rules in several benchmark problems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
L. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), pages 30--37, 1995.
 
3
A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike elements that can solve difficult learning control problems. IEEE Trans on Systems, Man, and Cybernetics, 13:835--846, 1983.
 
4
 
5
 
6
J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, pages 369--376, 1995.
 
7
 
8
 
9
C.-S. Chow and J. N. Tsitsiklis. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Trans on Automatic Control, 36(8):898--814, 1991.
 
10
11
12
 
13
S. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, UK, 2003.
 
14
S. Kakade, M. J. Kearns, and J. Langford. Exploration in metric state spaces. In Proceedings of the Twentieth International Conference on Machine Learning (ICML-03), pages 306--312, 2003.
 
15
 
16
 
17
 
18
A. Nouri and M. L. Littman. Multi-resolution exploration in continuous spaces. In Advances in Neural Information Processing Systems 21 (NIPS-08), 2009.
19
 
20
 
21
 
22
A. L. Strehl and M. L. Littman. Online linear regression and its application to model-based reinforcement learning. In Advances in Neural Information Processing Systems 20 (NIPS-07), pages 1417--1424, 2008.
 
23
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038--1044, 1996.
 
24
 
25
S. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, pages 527--559. 1992.


Collaborative Colleagues:
Lihong Li: colleagues
Michael L. Littman: colleagues
Christopher R. Mansley: colleagues