ACM Home Page
Please provide us with feedback. Feedback
Multi-agent reward analysis for learning in noisy domains
Full text PdfPdf (679 KB)
Source International Conference on Autonomous Agents archive
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems table of contents
The Netherlands
SESSION: Papers: learning table of contents
Pages: 81 - 88  
Year of Publication: 2005
ISBN:1-59593-093-0
Authors
Adrian Agogino  UC Santa Cruz, Moffett Field, CA
Kagan Turner  NASA Ames Research Center, Moffett Field, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 40,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1082473.1082486
What is a DOI?

ABSTRACT

In many multi-agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents' reward structure. We then use this reward property visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents' movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Agogino and K. Tumer. Efficient evaluation functions for multi-rover systems. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2004), pages 1--12, Seattle, WA, 2004.
 
2
Adrian Agogino, Cheryl Martin, and Joydeep Ghosh. Visualization of radial basis function networks. In Proceedings of International Joint Conference on Neural Networks, Washington, DC, 1999.
 
3
 
4
Horst Bishof, Axel Pinz, and Walter G. Kropatsch. Visualization methods for neural networks. In 11th International Conference on Pattern Recognition. pages 581--585, The Hague, Netherlands, 1992.
 
5
 
6
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017--1023. MIT Press, 1996.
 
7
 
8
Marcus Gallagher and Tom Downs. Visualization of learning in neural networks using principal component analysis. In International Conference on Computational Intelligence and Multimedia Applications, pages 327--331, 1997.
 
9
 
10
 
11
 
12
 
13
 
14
K. Tumer. Designing agent utilities for coordinated, scalable and robust multi-agent systems. In P. Scerri, R. Mailler, and R. Vincent, editors, Challenges in the Coordination of Large Scale Multiagent Systems. Springer, 2005. to appear.
15
 
16
 
17
 
18
 
19
20
 
21
D. H. Wolpert and K. Tumer. Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2/3):265--279, 2001.
 
22
D. H. Wolpert, K. Tumer, and E. Bandari. Improving search algorithms by using intelligent coordinates. Physical Review E, 69:017701, 2004.
 
23
D. H. Wolpert, K. Wheeler, and K. Tumer. Collective intelligence for control of distributed dynamical systems. Europhysics Letters, 49(6), March 2000.


Collaborative Colleagues:
Adrian Agogino: colleagues
Kagan Turner: colleagues