ACM Home Page
Please provide us with feedback. Feedback
Generalized model learning for reinforcement learning in factored domains
Full text PdfPdf (247 KB)
Source
International Conference on Autonomous Agents archive
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2 table of contents
Budapest, Hungary
SESSION: Multi-agent learning table of contents
Pages 717-724  
Year of Publication: 2009
ISBN:978-0-9817381-7-8
Authors
Todd Hester  University of Texas at Austin, Austin, TX
Peter Stone  University of Texas at Austin, Austin, TX
Sponsors
: The Foundation for Intelligent Physical Agents
Microsoft Research : Microsoft Research
: Whitestein Technologies
: European Office of Aerospace Research and Development, Air Force Office of Scientific Research, United States Air Force Research Laboratory
: Drexel University
: Wiley -- Blackwell Ltd
Publisher
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 54,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Improving the sample efficiency of reinforcement learning algorithms to scale up to larger and more realistic domains is a current research challenge in machine learning. Model-based methods use experiential data more efficiently than model-free approaches but often require exhaustive exploration to learn an accurate model of the domain. We present an algorithm, Reinforcement Learning with Decision Trees (rl-dt), that uses supervised learning techniques to learn the model by generalizing the relative effect of actions across states. Specifically, rl-dt uses decision trees to model the relative effects of actions in the domain. The agent explores the environment exhaustively in early episodes when its model is inaccurate. Once it believes it has developed an accurate model, it exploits its model, taking the optimal action at each step. The combination of the learning approach with the targeted exploration policy enables fast learning of the model. The sample efficiency of the algorithm is evaluated empirically in comparison to five other algorithms across three domains. rl-dt consistently accrues high cumulative rewards in comparison with the other algorithms tested.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. S. Albus. A new approach to manipulator control: The cerebellar model articulation controller. Journal of Dynamic Systems, Measurement, and Control, 97(3):220--227, 1975.
 
2
R. I. Brafman and M. Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 953--958, 2001.
3
 
4
5
 
6
B. R. Leffler, M. L. Littman, and T. Edmunds. Efficient reinforcement learning with relocatable action models. In Proceedings of the Twenty-Second National Conference on Artificial Intelligence, pages 572--577, 2007.
 
7
 
8
 
9
G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
 
10
A. L. Strehl, C. Diuk, and M. L. Littman. Efficient structure learning in factored-state mdps. In AAAI, pages 645--650. AAAI Press, 2007.
 
11
 
12
C. Watkins. Learning From Delayed Rewards. PhD thesis, University of Cambridge, 1989.
 
13

Collaborative Colleagues:
Todd Hester: colleagues
Peter Stone: colleagues