ACM Home Page
Please provide us with feedback. Feedback
An empirical analysis of value function-based and policy search reinforcement learning
Full text PdfPdf (439 KB)
Source
International Conference on Autonomous Agents archive
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2 table of contents
Budapest, Hungary
SESSION: Multi-agent learning table of contents
Pages 749-756  
Year of Publication: 2009
ISBN:978-0-9817381-7-8
Authors
Shivaram Kalyanakrishnan  The University of Texas at Austin
Peter Stone  The University of Texas at Austin
Sponsors
: The Foundation for Intelligent Physical Agents
Microsoft Research : Microsoft Research
: Whitestein Technologies
: European Office of Aerospace Research and Development, Air Force Office of Scientific Research, United States Air Force Research Laboratory
: Drexel University
: Wiley -- Blackwell Ltd
Publisher
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 37,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

In several agent-oriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actions that result in long-term benefit. Reinforcement Learning (or sequential decision making) is a paradigm well-suited to this requirement. Value function-based methods and policy search methods are contrasting approaches to solve reinforcement learning tasks. While both classes of methods benefit from independent theoretical analyses, these often fail to extend to the practical situations in which the methods are deployed. We conduct an empirical study to examine the strengths and weaknesses of these approaches by introducing a suite of test domains that can be varied for problem size, stochasticity, function approximation, and partial observability. Our results indicate clear patterns in the domain characteristics for which each class of methods excels. We investigate whether their strengths can be combined, and develop an approach to achieve that purpose. The effectiveness of this approach is also demonstrated on the challenging benchmark task of robot soccer Keepaway. We highlight several lines of inquiry that emanate from this study.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319--350, 2001.
 
2
R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, June 1957.
 
3
S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actor-critic algorithms. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 105--112. MIT Press, Cambridge, MA, 2008.
 
4
 
5
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, November 27--30, 1995, pages 1017--1023. MIT Press, 1996.
 
6
P. T. De Boer, D. P. Kroese, S. Mannor, and R. Rubinstein. A tutorial on the cross-entropy method. Annals of Operations Research, 134(1):19--67, 2005.
 
7
S. Kakade. A natural policy gradient. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1531--1538. MIT Press, 2001.
 
8
 
9
 
10
 
11
A. Y. Ng, H. J. Kim, M. I. Jordan, and S. Sastry. Autonomous helicopter flight via reinforcement learning. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
 
12
T. J. Perkins and D. Precup. A convergent form of approximate policy iteration. In S. T. S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 1595--1602. MIT Press, Cambridge, MA, 2003.
 
13
 
14
G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
 
15
K. O. Stanley. Efficient evolution of neural networks through complexification. Technical Report AI-TR-04-314, Department of Computer Sciences, University of Texas at Austin, August 2004.
 
16
P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3):165--188, 2005.
 
17
 
18
19
20
 
21
 
22
 
23

Collaborative Colleagues:
Shivaram Kalyanakrishnan: colleagues
Peter Stone: colleagues