ACM Home Page
Please provide us with feedback. Feedback
Dynamic abstraction in reinforcement learning via clustering
Full text PdfPdf (377 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 71  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Shie Mannor  Massachusetts Institute of Technology, Cambridge, MA
Ishai Menache  Technion, Israel
Amit Hoze  Technion, Israel
Uri Klein  Technion, Israel
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 45,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015355
What is a DOI?

ABSTRACT

We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated on-line by the learning agent, representing the topological structure of the state transitions. A clustering algorithm is then used to partition the state space to different regions. Policies for reaching the different parts of the space are separately learned and added to the model in a form of options (macro-actions). The options are used for accelerating the Q-Learning algorithm. We extend the basic algorithm and consider building a map that includes preliminary indication of the location of "interesting" regions of the state space, where the value gradient is significant and additional exploration might be beneficial. Experiments indicate significant speedups, especially in the initial learning phase.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Anderberg, M. (1973). Cluster analysis for applications. Academic Press.
 
2
 
3
Barto, A., Sutton, R., & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834--846.
 
4
Baxter, J., & Bartlett, P. L. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 319--350.
 
5
 
6
 
7
 
8
Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227--303.
 
9
 
10
Ernst, D., Geurts, P., & Wehenkel, L. (2003). Iteratively extending time horizon reinforcement learning. Proceedings of the 14th European Conference on Machine Learning (pp. 96--107).
 
11
 
12
13
 
14
 
15
 
16
McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. Proceedings of the 1997 Grace Hopper Celebration of Women in Computing (pp. 13--18).
 
17
 
18
Moriarty, D., Schultz, A., & Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11, 199--229.
 
19
 
20
Theocharous, G., & Kaelbling, L. P. (2003). Approximate planning in POMDPs with macro-actions. To appear in Advances in Neural Processing Information Systems 17.

CITED BY  10

Collaborative Colleagues:
Shie Mannor: colleagues
Ishai Menache: colleagues
Amit Hoze: colleagues
Uri Klein: colleagues