| Automatic discovery and transfer of MAXQ hierarchies |
| Full text |
Pdf
(307 KB)
|
| Source
|
ICML; Vol. 307
archive
Proceedings of the 25th international conference on Machine learning
table of contents
Helsinki, Finland
Pages 648-655
Year of Publication: 2008
ISBN:978-1-60558-205-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 47, Citation Count: 1
|
|
|
ABSTRACT
We present an algorithm, HI-MAT (Hierarchy Induction via Models And Trajectories), that discovers MAXQ task hierarchies by applying dynamic Bayesian network models to a successful trajectory from a source reinforcement learning task. HI-MAT discovers subtasks by analyzing the causal and temporal relationships among the actions in the trajectory. Under appropriate assumptions, HI-MAT induces hierarchies that are consistent with the observed trajectory and have compact value-function tables employing safe state abstractions. We demonstrate empirically that HI-MAT constructs compact hierarchies that are comparable to manually-engineered hierarchies and facilitate significant speedup in learning when transferred to a target task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Şimşek, Ö., & Barto, A. (2004). Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning. ICML (pp. 751--758).
|
| |
3
|
Dietterich, T. (2000). Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 13, 227--303.
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Marthi, B., Kaelbling, L., & Lozano-Perez, T. (2007). Learning Hierarchical Structure In Policies. NIPS Hierarchical Organization of Behavior Workshop.
|
| |
8
|
|
| |
9
|
Mehta, N., & Tadepalli, P. (2005). Multi-Agent Shared Hierarchy Reinforcement Learning. ICML Rich Representations in Reinforcement Learning Workshop.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Thrun, S., & Schwartz, A. (1995). Finding Structure in Reinforcement Learning. NIPS (pp. 385--392).
|
CITED BY
|
|
Peng Zang , Peng Zhou , David Minnen , Charles Isbell, Discovering options from example trajectories, Proceedings of the 26th Annual International Conference on Machine Learning, p.1217-1224, June 14-18, 2009, Montreal, Quebec, Canada
|
|