|
ABSTRACT
One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring large MDPs by integrating a powerful exploration technique, Rmax, into a state-of-the-art learning algorithm, least-squares policy iteration (LSPI). This approach combines the strengths of both methods, and has shown its effectiveness and superiority over LSPI with two other popular exploration rules in several benchmark problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
L. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), pages 30--37, 1995.
|
| |
3
|
A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike elements that can solve difficult learning control problems. IEEE Trans on Systems, Man, and Cybernetics, 13:835--846, 1983.
|
| |
4
|
|
| |
5
|
|
| |
6
|
J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, pages 369--376, 1995.
|
| |
7
|
|
| |
8
|
|
| |
9
|
C.-S. Chow and J. N. Tsitsiklis. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Trans on Automatic Control, 36(8):898--814, 1991.
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
S. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, UK, 2003.
|
| |
14
|
S. Kakade, M. J. Kearns, and J. Langford. Exploration in metric state spaces. In Proceedings of the Twentieth International Conference on Machine Learning (ICML-03), pages 306--312, 2003.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
A. Nouri and M. L. Littman. Multi-resolution exploration in continuous spaces. In Advances in Neural Information Processing Systems 21 (NIPS-08), 2009.
|
 |
19
|
Pascal Poupart , Nikos Vlassis , Jesse Hoey , Kevin Regan, An analytic solution to discrete Bayesian reinforcement learning, Proceedings of the 23rd international conference on Machine learning, p.697-704, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143932]
|
| |
20
|
|
| |
21
|
|
| |
22
|
A. L. Strehl and M. L. Littman. Online linear regression and its application to model-based reinforcement learning. In Advances in Neural Information Processing Systems 20 (NIPS-07), pages 1417--1424, 2008.
|
| |
23
|
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038--1044, 1996.
|
| |
24
|
|
| |
25
|
S. Thrun. The role of exploration in learning control. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, pages 527--559. 1992.
|
|