ACM Home Page
Please provide us with feedback. Feedback
Reward shaping for valuing communications during multi-agent coordination
Full text PdfPdf (296 KB)
Source
International Conference on Autonomous Agents archive
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1 table of contents
Budapest, Hungary
SESSION: Coordination/DCOP/resource allocation table of contents
Pages 641-648  
Year of Publication: 2009
ISBN:978-0-9817381-6-1
Authors
Simon A. Williamson  University of Southampton, Southampton, UK
Enrico H. Gerding  University of Southampton, Southampton, UK
Nicholas R. Jennings  University of Southampton, Southampton, UK
Sponsors
: The Foundation for Intelligent Physical Agents
Microsoft Research : Microsoft Research
: Wiley - Blackwell Ltd
: Whitestein Technologies
: European Office of Aerospace Research and Development, Air Force Office of Scientific Research, United States Air Force Research Laboratory
: Drexel University
Publisher
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 35,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Decentralised coordination in multi-agent systems is typically achieved using communication. However, in many cases, communication is expensive to utilise because there is limited bandwidth, it may be dangerous to communicate, or communication may simply be unavailable at times. In this context, we argue for a rational approach to communication --- if it has a cost, the agents should be able to calculate a value of communicating. By doing this, the agents can balance the need to communicate with the cost of doing so. In this research, we present a novel model of rational communication, that uses reward shaping to value communications, and employ this valuation in decentralised POMDP policy generation. In this context, reward shaping is the process by which expectations over joint actions are adjusted based on how coordinated the agent team is. An empirical evaluation of the benefits of this approach is presented in two domains. First, in the context of an idealised bench-mark problem, the multiagent Tiger problem, our method is shown to require significantly less communication (up to 30% fewer messages) and still achieves a 30% performance improvement over the current state of the art. Second, in the context of a larger-scale problem, RoboCupRescue, our method is shown to scale well, and operate without recourse to significant amounts of domain knowledge.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79--86, 1951.
 
6
R. Nair, M. Tambe, M. Yokoo, D. Pynadath, and S. Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 705--711, 2003.
 
7
8
9
 
10
S. A. Williamson, E. H. Gerding, and N. R. Jennings. A principled information valuation for communications during multi-agent coordination. In Proceedings of the AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains, pages 137--151, 2007.
11
 
12
13

Collaborative Colleagues:
Simon A. Williamson: colleagues
Enrico H. Gerding: colleagues
Nicholas R. Jennings: colleagues