ACM Home Page
Please provide us with feedback. Feedback
Expediting RL by using graphical structures
Full text PdfPdf (376 KB)
Source
International Conference on Autonomous Agents archive
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3 table of contents
Estoril, Portugal
SESSION: Agent and multi-agent learning table of contents
Pages 1325-1328  
Year of Publication: 2008
ISBN:978-0-9817381-2-X
Authors
Peng Dai  University of Washington, Seattle, WA
Alexander L. Strehl  Yahoo! Research, New York
Judy Goldsmith  University of Kentucky, Lexington, KY
Sponsors
ACM: Association for Computing Machinery
AAAI : Association for the Advancement of Artifical Intelligence
Publisher
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 39,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

The goal of Reinforcement learning (RL) is to maximize reward (minimize cost) in a Markov decision process (MDP) without knowing the underlying model a priori. RL algorithms tend to be much slower than planning algorithms, which require the model as input. Recent results demonstrate that MDP planning can be expedited, by exploiting the graphical structure of the MDP. We present extensions to two popular RL algorithms, Q-learning and RMax, that learn and exploit the graphical structure of problems to improve overall learning speed. Use of the graphical structure of the underlying MDP can greatly improve the speed of planning algorithms, if the underlying MDP has a nontrivial topological structure. Our experiments show that use of the apparent topological structure of an MDP speeds up reinforcement learning, even if the MDP is simply connected.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. J. of Artificial Intelligence Research, 11:1--94, 1999.
 
3
 
4
P. Dai and J. Goldsmith. Topological value iteration algorithm for Markov decision processes. In Proc. IJCAI-07, pages 1860--1865, 2007.
 
5
 
6
S. M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.
 
7
 
8
 
9
R. Munos and A. Moore. Influence and variance of a Markov chain: Application to adaptive discretization in optimal control. In Proc. of IEEE Conference on Decision and Control, 1999.
10
11
 
12
 
13
C. J. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, UK, 1989.
 
14
 
15

Collaborative Colleagues:
Peng Dai: colleagues
Alexander L. Strehl: colleagues
Judy Goldsmith: colleagues