ACM Home Page
Please provide us with feedback. Feedback
A social reinforcement learning agent
Full text PdfPdf (251 KB)
Source International Conference on Autonomous Agents archive
Proceedings of the fifth international conference on Autonomous agents table of contents
Montreal, Quebec, Canada
Pages: 377 - 384  
Year of Publication: 2001
ISBN:1-58113-326-X
Authors
Charles Isbell  AT&T Labs, 180 Park Avenue, Florham Park, NJ
Christian R. Shelton  AT&T Labs, 180 Park Avenue, Florham Park, NJ
Michael Kearns  AT&T Labs, 180 Park Avenue, Florham Park, NJ
Satinder Singh  AT&T Labs, 180 Park Avenue, Florham Park, NJ
Peter Stone  AT&T Labs, 180 Park Avenue, Florham Park, NJ
Sponsor
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 46,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/375735.376334
What is a DOI?

ABSTRACT

We report on our reinforcement learning work on Cobot, a software agent that resides in the well-known online chat community LambdaMOO. Our initial work on Cobot~\cite{cobotaaai} provided him with the ability to collect {\em social statistics\/} and report them to users in a reactive manner. Here we describe our application of reinforcement learning to allow Cobot to proactively take actions in this complex social environment, and adapt his behavior from multiple sources of human reward. After 5 months of training, Cobot received 3171 reward and punishment events from 254 different Lambda\-MOO users, and learned nontrivial preferences for a number of users. Cobot modifies his behavior based on his current state in an attempt to maximize reward. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Eisenberg, A. (2000). Find Me a File, Cache Me a Catch. New York Times, February 10, 2000. http://www.nytimes.com/library/tech/00/02/circuits/ articles/10matc.html.
2
 
3
 
4
 
5
Shelton, C. R. (2000). Balancing Multiple Sources of Reward in Reinforcement Learning. Submitted for publication in Neural Information Processing Systems-2000.
 
6
7
 
8
 
9
Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Neural Information Processing Systems-1999.

CITED BY  13

Collaborative Colleagues:
Charles Isbell: colleagues
Christian R. Shelton: colleagues
Michael Kearns: colleagues
Satinder Singh: colleagues
Peter Stone: colleagues