| A social reinforcement learning agent |
| Full text |
Pdf
(251 KB)
|
| Source
|
International Conference on Autonomous Agents
archive
Proceedings of the fifth international conference on Autonomous agents
table of contents
Montreal, Quebec, Canada
Pages: 377 - 384
Year of Publication: 2001
ISBN:1-58113-326-X
|
|
Authors
|
|
Charles Isbell
|
AT&T Labs, 180 Park Avenue, Florham Park, NJ
|
|
Christian R. Shelton
|
AT&T Labs, 180 Park Avenue, Florham Park, NJ
|
|
Michael Kearns
|
AT&T Labs, 180 Park Avenue, Florham Park, NJ
|
|
Satinder Singh
|
AT&T Labs, 180 Park Avenue, Florham Park, NJ
|
|
Peter Stone
|
AT&T Labs, 180 Park Avenue, Florham Park, NJ
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 46, Citation Count: 13
|
|
|
ABSTRACT
We report on our reinforcement learning work on Cobot, a software agent that resides in the well-known online chat community LambdaMOO. Our initial work on Cobot~\cite{cobotaaai} provided him with the ability to collect {\em social statistics\/} and report them to users in a reactive manner. Here we describe our application of reinforcement learning to allow Cobot to proactively take actions in this complex social environment, and adapt his behavior from multiple sources of human reward. After 5 months of training, Cobot received 3171 reward and punishment events from 254 different Lambda\-MOO users, and learned nontrivial preferences for a number of users. Cobot modifies his behavior based on his current state in an attempt to maximize reward. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Eisenberg, A. (2000). Find Me a File, Cache Me a Catch. New York Times, February 10, 2000. http://www.nytimes.com/library/tech/00/02/circuits/ articles/10matc.html.
|
 |
2
|
|
| |
3
|
Charles Lee Isbell, Jr. , Michael J. Kearns , Dave Kormann , Satinder P. Singh , Peter Stone, Cobot in LambdaMOO: A Social Statistics Agent, Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, p.36-41, July 30-August 03, 2000
|
| |
4
|
|
| |
5
|
Shelton, C. R. (2000). Balancing Multiple Sources of Reward in Reinforcement Learning. Submitted for publication in Neural Information Processing Systems-2000.
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Neural Information Processing Systems-1999.
|
CITED BY 13
|
|
|
|
|
Michael Kearns , Charles Isbell , Satinder Singh , Diane Litman , Jessica Howe, CobotDS: a spoken dialogue system for chat, Eighteenth national conference on Artificial intelligence, p.425-430, July 28-August 01, 2002, Edmonton, Alberta, Canada
|
|
|
|
|
|
Olufisayo Omojokun , Charles Lee Isbell, Jr., User modeling for personalized universal appliance interaction, Proceedings of the 2003 conference on Diversity in computing, p.65-68, October 15-18, 2003, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
Sooraj Bhat , David L. Roberts , Mark J. Nelson , Charles L. Isbell , Michael Mateas, A globally optimal algorithm for TTD-MDPs, Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, May 14-18, 2007, Honolulu, Hawaii
|
|
|
|
|
|
|
|
|
Elizabeth S. Kim , Dan Leyzberg , Katherine M. Tsui , Brian Scassellati, How people talk when teaching a robot, Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, March 09-13, 2009, La Jolla, California, USA
|
|
|
David L. Roberts , Mark J. Nelson , Charles L. Isbell , Michael Mateas , Michael L. Littman, Targeting specific distributions of trajectories in MDPs, proceedings of the 21st national conference on Artificial intelligence, p.1213-1218, July 16-20, 2006, Boston, Massachusetts
|
|
|
|
|
|
|
|