|
ABSTRACT
Temporal difference (TD) learning methods [22] have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found very slow in practice. A key feature of TD methods is that they represent policies in terms of value functions. In this paper we introduce behavior transfer, a novel approach to speeding up TD learning by transferring the learned value function from one task to a second related task. We present experimental results showing that autonomous learners are able to learn one multiagent task and then use behavior transfer to markedly reduce the total training time for a more complex task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda. Vision-based behavior acquisition for a shooting robot by using a reinforcement learning. In Proc. of IAPR/IEEE Workshop on Visual Behaviors-1994, pages 112--118, 1994.
|
| |
5
|
R. Boer and J. Kok. The Incremental Development of a Synthetic Multi-agent System: The UvA Trilearn 2001 Robotic Soccer Simulation Team. Master's thesis, University of Amsterdam, The Netherlands, February 2002.
|
| |
6
|
M. Colombetti and M. Dorigo. Robot Shaping: Developing Situated Agents through Learning. Technical Report TR-92-040, International Computer Science Institute, Berkeley, CA, 1993.
|
| |
7
|
C. Drummond. Accelerating reinforcement learning by composing solutions of automatically identified subtasks. Journal of Artificial Intelligence Research, 16:59--104, 2002.
|
| |
8
|
|
| |
9
|
A. Fern, S. Yoon, and R. Givan. Approximate policy iteration with a policy language bias. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
|
| |
10
|
C. Guestrin, D. Koller, C. Gearhart, and N. Kanodia. Generalizing plans to new environments in relational mdps. In International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico, August 2003.
|
| |
11
|
M. J. Mataric. Reward functions for accelerated learning. In International Conference on Machine Learning, pages 181--189, 1994.
|
| |
12
|
E. F. Morales. Scaling up reinforcement learning with a relational representation. In Proc. of the Workshop on Adaptability in Multi-agent Systems, January 2003.
|
| |
13
|
|
| |
14
|
B. Price and C. Boutilier. Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research, 19:569--629, 2003.
|
| |
15
|
|
| |
16
|
Martin A. Riedmiller , Artur Merke , David Meier , Andreas Hoffman , Alex Sinner , Ortwin Thate , R. Ehrmann, Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer, RoboCup 2000: Robot Soccer World Cup IV, p.367-372, January 2001
|
| |
17
|
O. Selfridge, R. S. Sutton, and A. G. Barto. Training and tracking in robotics. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pages 670--672, 1985.
|
| |
18
|
|
| |
19
|
P. Stone, G. Kuhlmann, M. Taylor, and Y. Liu. Keepaway Soccer: From Machine Learning Testbed to Benchmark. In Proceedings of RoboCup International Symposium, 2005. To appear.
|
| |
20
|
|
| |
21
|
P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 2005. To appear.
|
| |
22
|
|
| |
23
|
|
|