ACM Home Page
Please provide us with feedback. Feedback
Efficient learning of multi-step best response
Full text PdfPdf (307 KB)
Source International Conference on Autonomous Agents archive
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems table of contents
The Netherlands
SESSION: Papers: learning table of contents
Pages: 60 - 66  
Year of Publication: 2005
ISBN:1-59593-093-0
Authors
Bikramjit Banerjee  Tulane University, New Orleans, LA
Jing Peng  Tulane University, New Orleans, LA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 25,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1082473.1082483
What is a DOI?

ABSTRACT

We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoner's Dilemma game shows that even when no extra domain knowledge (besides that the opponent's memory size is known) is assumed, the error can still be small.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Axelrod. The Evolution of Cooperation. Basic Books, 1984.
 
2
J. A. Bagnell, S. Kakade, A. Y. Ng, and J. Schneider. Policy search by dynamic programming. In Advances in Neural Information Processing Systems 16. MIT Press, 2003.
 
3
B. Banerjee and J. Peng. Performance bounded reinforcement learning in strategic interactions. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI-04), pages 2--7, San Jose, CA, 2004. AAAI Press.
 
4
 
5
 
6
 
7
D. Carmel and S. Markovitch. Learning models of intelligent agents. In Thirteenth National Conference on Artificial Intelligence, pages 62--67, Menlo Park, CA, 1996. AAAI Press/MIT Press.
 
8
 
9
D. Carmel and S. Markovitch. Model-based learning of interaction strategies in multiagent systems. Journal of Experimental and Theoretical Artificial Intelligence, 10(3):309 -- 332, 1998.
 
10
V. Conitzer and T. Sandholm. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. In Proceedings of the 20th International Conference on Machine Learning, 2003.
11
 
12
13
 
14
 
15
S. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
 
16
M. Kearns, Y. Mansour, and A. Ng. Approximate planning in large pomdps via reusable trajectories. In Advances in Neural Information Processing Systems 12. MIT Press, 2000.
 
17
 
18
 
19
 
20
G. Tesauro. Extending Q-learning to general adaptive multi-agent systems. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.


Collaborative Colleagues:
Bikramjit Banerjee: colleagues
Jing Peng: colleagues