ACM Home Page
Please provide us with feedback. Feedback
Learning sequences of actions in collectives of autonomous agents
Full text PdfPdf (247 KB)
Source International Conference on Autonomous Agents archive
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1 table of contents
Bologna, Italy
SESSION: Session 3C: evolution, adaptation and learning I table of contents
Pages: 378 - 385  
Year of Publication: 2002
ISBN:1-58113-480-0
Authors
Kagan Tumer  NASA Ames Research Center, Moffett Field, CA
Adrian K. Agogino  The University of Texas, Austin, TX
David H. Wolpert  NASA Ames Research Center, Moffett Field, CA
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 49,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/544741.544832
What is a DOI?

ABSTRACT

In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. Directly applying Reinforcement Learning (RL) concepts to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in designing multi-agent systems focuses on how to set the rewards for the RL algorithm of each agent so that as the agents attempt to maximize those rewards, the system reaches a globally "desirable" solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence [15,23] to design rewards for the agents that are "aligned" with the global reward, and are "learnable" in that agents can readily see how their behavior affects their reward. We show that reinforcement learning agents using those rewards outperform both "natural" extensions of single agent algorithms and global reinforcement learning solutions based on "team games".


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Boutilier. Multiagent systems: Challenges and opportunities for decision theoretic planning. AI Magazine, 20:35--43, winter 1999.
 
2
J. A. Boyan and M. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. In Advances in Neural Information Processing Systems - 6, pages 671--678. Morgan Kaufman, 1994.
 
3
 
4
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems - 8, pages 1017--1023. MIT Press, 1996.
 
5
A. Greenwald, E. Friedman, and S. Shenker. Learning in network contexts: Experimental results from simulations. Journal of Games and Economic Behavior: Special Issue on Economics and Artificial Intelligence, 35(1/2):80--123, 2001.
 
6
T. Groves. Incentives in teams. Econometrica, 41:617--631, 1973.
 
7
G. Hardin. The tragedy of the commons. Science, 162:1243--1248, 1968.
 
8
 
9
 
10
W. Nicholson. Microeconomic Theory. The Dryden Press, seventh edition, 1998.
 
11
T. Sandholm and R. Crites. Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37:147--166, 1995.
 
12
S. Sen. Multi-Agent Learning: Papers from the 1997 AAAI Workshop (Technical Report WS-97-03. AAAI Press, Menlo Park, CA, 1997.
 
13
 
14
 
15
 
16
W. Vickrey. Counterspeculation, auctions and competitive sealed tenders. Journal of Finance, 16:8--37, 1961.
 
17
 
18
M. P. Wellman. A market-oriented programming environment and its application to distributed multicommodity flow problems. In Journal of Artificial Intelligence Research, 1993.
19
 
20
D. H. Wolpert and K. Tumer. An Introduction to Collective Intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Center, 1999. URL:http://ic.arc.nasa.gov/ic/projects/coin_pubs.html. To appear in Handbook of Agent Technology, Ed. J. M. Bradshaw, AAAI/MIT Press.
 
21
D. H. Wolpert and K. Tumer. Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2/3):265--279, 2001.
 
22
 
23
D. H. Wolpert, K. Wheeler, and K. Tumer. Collective intelligence for control of distributed dynamical systems. Europhysics Letters, 49(6), March 2000.
 
24
W. Zhang and T. G. Dietterich. Solving combinatorial optimization tasks by reinforcement learning: A general methodology applied to resource-constrained scheduling. Journal of Artificial Intelligence Reseach, 2000.

CITED BY  10

Collaborative Colleagues:
Kagan Tumer: colleagues
Adrian K. Agogino: colleagues
David H. Wolpert: colleagues