| Learning complex motions by sequencing simpler motion templates |
| Full text |
Pdf
(676 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 753-760
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
Gerhard Neumann
|
Graz University of Technology, Graz, Austria
|
|
Wolfgang Maass
|
Graz University of Technology, Graz, Austria
|
|
Jan Peters
|
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 25, Citation Count: 0
|
|
|
ABSTRACT
Abstraction of complex, longer motor tasks into simpler elemental movements enables humans and animals to exhibit motor skills which have not yet been matched by robots. Humans intuitively decompose complex motions into smaller, simpler segments. For example when describing simple movements like drawing a triangle with a pen, we can easily name the basic steps of this movement. Surprisingly, such abstractions have rarely been used in artificial motor skill learning algorithms. These algorithms typically choose a new action (such as a torque or a force) at a very fast time-scale. As a result, both policy and temporal credit assignment problem become unnecessarily complex - often beyond the reach of current machine learning methods. We introduce a new framework for temporal abstractions in reinforcement learning (RL), i.e. RL with motion templates. We present a new algorithm for this framework which can learn high-quality policies by making only few abstract decisions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Arbib, M. A. (1981). Perceptual structures and distributed motor control. Handbook of physiology, section 2: The nervous system vol. ii, motor control, part 1, 1449--1480.
|
| |
2
|
Atkeson, C., & Stephens, B. (2007). Multiple balance strategies from one optimization criterion. 7th IEEE-RAS International Conference on Humanoid Robots.
|
| |
3
|
|
| |
4
|
Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time markov decision problems. Advances in Neural Information Processing Systems 7, 7, 393--400.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Ghavamzadeh, M., & Mahadevan, S. (2003). Hierarchical policy gradient algorithms. Twentieth International Conference on Machine Learning (ICML-2003) (pp. 226--233).
|
| |
8
|
Huber, M., & Grupen, R. A. (1998). Learning robot control---using control policies as abstract actions. In NIPS'98 Workshop: Abstraction and Hierarchy in Reinforcement Learning.
|
| |
9
|
Ijspeert, A., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. Advances in Neural Information Processing Systems 15 (NIPS2002) (pp. 1523--1530).
|
| |
10
|
Kober, J., & Peters, J. (2009). Policy search for motor primitives in robotics. Advances in Neural Information Processing Systems 22 (NIPS 2008) (pp. 849--856). MA: MIT Press.
|
| |
11
|
Neumann, G., & Peters, J. (2009). Fitted Q-iteration by Advantage Weighted Regression. Advances in Neural Information Processing Systems 22 (NIPS 2008) (pp. 1177--1184). MA: MIT Press.
|
| |
12
|
Riedmiller, M. (2005). Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method. Proceedings of the European Conference on Machine Learning (ECML) (pp. 317--328).
|
| |
13
|
|
| |
14
|
Xu, X., & Antsaklis, P. (2002). An approach to optimal control of switched systems with internally forced switchings. Proceedings of the American Control Conference (pp. 148--153). Anchorage, USA.
|
|