|
ABSTRACT
In this paper we focus on the problem of designing a collective of autonomous agents that individually learn sequences of actions such that the resultant sequence of joint actions achieves a predetermined global objective. Directly applying Reinforcement Learning (RL) concepts to multi-agent systems often proves problematic, as agents may work at cross-purposes, or have difficulty in evaluating their contribution to achievement of the global objective, or both. Accordingly, the crucial design step in designing multi-agent systems focuses on how to set the rewards for the RL algorithm of each agent so that as the agents attempt to maximize those rewards, the system reaches a globally "desirable" solution. In this work we consider a version of this problem involving multiple autonomous agents in a grid world. We use concepts from collective intelligence [15,23] to design rewards for the agents that are "aligned" with the global reward, and are "learnable" in that agents can readily see how their behavior affects their reward. We show that reinforcement learning agents using those rewards outperform both "natural" extensions of single agent algorithms and global reinforcement learning solutions based on "team games".
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Boutilier. Multiagent systems: Challenges and opportunities for decision theoretic planning. AI Magazine, 20:35--43, winter 1999.
|
| |
2
|
J. A. Boyan and M. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. In Advances in Neural Information Processing Systems - 6, pages 671--678. Morgan Kaufman, 1994.
|
| |
3
|
|
| |
4
|
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems - 8, pages 1017--1023. MIT Press, 1996.
|
| |
5
|
A. Greenwald, E. Friedman, and S. Shenker. Learning in network contexts: Experimental results from simulations. Journal of Games and Economic Behavior: Special Issue on Economics and Artificial Intelligence, 35(1/2):80--123, 2001.
|
| |
6
|
T. Groves. Incentives in teams. Econometrica, 41:617--631, 1973.
|
| |
7
|
G. Hardin. The tragedy of the commons. Science, 162:1243--1248, 1968.
|
| |
8
|
|
| |
9
|
|
| |
10
|
W. Nicholson. Microeconomic Theory. The Dryden Press, seventh edition, 1998.
|
| |
11
|
T. Sandholm and R. Crites. Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37:147--166, 1995.
|
| |
12
|
S. Sen. Multi-Agent Learning: Papers from the 1997 AAAI Workshop (Technical Report WS-97-03. AAAI Press, Menlo Park, CA, 1997.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
W. Vickrey. Counterspeculation, auctions and competitive sealed tenders. Journal of Finance, 16:8--37, 1961.
|
| |
17
|
|
| |
18
|
M. P. Wellman. A market-oriented programming environment and its application to distributed multicommodity flow problems. In Journal of Artificial Intelligence Research, 1993.
|
 |
19
|
David H. Wolpert , Sergery Kirshner , Chris J. Merz , Kagan Tumer, Adaptivity in agent-based routing for data networks, Proceedings of the fourth international conference on Autonomous agents, p.396-403, June 03-07, 2000, Barcelona, Spain
[doi> 10.1145/336595.337552]
|
| |
20
|
D. H. Wolpert and K. Tumer. An Introduction to Collective Intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Center, 1999. URL:http://ic.arc.nasa.gov/ic/projects/coin_pubs.html. To appear in Handbook of Agent Technology, Ed. J. M. Bradshaw, AAAI/MIT Press.
|
| |
21
|
D. H. Wolpert and K. Tumer. Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2/3):265--279, 2001.
|
| |
22
|
|
| |
23
|
D. H. Wolpert, K. Wheeler, and K. Tumer. Collective intelligence for control of distributed dynamical systems. Europhysics Letters, 49(6), March 2000.
|
| |
24
|
W. Zhang and T. G. Dietterich. Solving combinatorial optimization tasks by reinforcement learning: A general methodology applied to resource-constrained scheduling. Journal of Artificial Intelligence Reseach, 2000.
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hugo Santana , Geber Ramalho , Vincent Corruble , Bohdana Ratitch, Multi-Agent Patrolling with Reinforcement Learning, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, p.1122-1129, July 19-23, 2004, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|