|
ABSTRACT
In many multi-agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents' reward structure. We then use this reward property visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents' movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agogino and K. Tumer. Efficient evaluation functions for multi-rover systems. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), pages 1--12, Seattle, WA, 2004.
|
| |
2
|
Adrian Agogino, Cheryl Martin, and Joydeep Ghosh. Visualization of radial basis function networks. In Proceedings of International Joint Conference on Neural Networks, Washington, DC, 1999.
|
| |
3
|
|
| |
4
|
Horst Bishof, Axel Pinz, and Walter G. Kropatsch. Visualization methods for neural networks. In 11th International Conference on Pattern Recognition. pages 581--585, The Hague, Netherlands, 1992.
|
| |
5
|
|
| |
6
|
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017--1023. MIT Press, 1996.
|
| |
7
|
|
| |
8
|
Marcus Gallagher and Tom Downs. Visualization of learning in neural networks using principal component analysis. In International Conference on Computational Intelligence and Multimedia Applications, pages 327--331, 1997.
|
| |
9
|
|
| |
10
|
Pieter Jan't Hoen , Girish Redekar , Valentin Robu , Han La Poutre, Simulation and Visualization of a Market-Based Model for Logistics Management in Transportation, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, p.1218-1219, July 19-23, 2004, New York, New York
[doi> 10.1109/AAMAS.2004.245]
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
K. Tumer. Designing agent utilities for coordinated, scalable and robust multi-agent systems. In P. Scerri, R. Mailler, and R. Vincent, editors, Challenges in the Coordination of Large Scale Multiagent Systems. Springer, 2005. to appear.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
David H. Wolpert , Sergery Kirshner , Chris J. Merz , Kagan Tumer, Adaptivity in agent-based routing for data networks, Proceedings of the fourth international conference on Autonomous agents, p.396-403, June 03-07, 2000, Barcelona, Spain
[doi> 10.1145/336595.337552]
|
| |
21
|
D. H. Wolpert and K. Tumer. Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2/3):265--279, 2001.
|
| |
22
|
D. H. Wolpert, K. Tumer, and E. Bandari. Improving search algorithms by using intelligent coordinates. Physical Review E, 69:017701, 2004.
|
| |
23
|
D. H. Wolpert, K. Wheeler, and K. Tumer. Collective intelligence for control of distributed dynamical systems. Europhysics Letters, 49(6), March 2000.
|
|