|
|||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||
ABSTRACT
As the reach of multiagent reinforcement learning extends to increasingly complex tasks, it is likely that the diverse challenges encountered can only be surmounted by combining the strengths of different learning methods. We consider this aspect of learning through the case study of Keepaway, a popular benchmark for multiagent reinforcement learning from the robot soccer domain. Whereas previous successful results in this domain have limited learning to an isolated, infrequent decision that amounts to a turn-taking behavior (Pass), we expand the agents' learning capability to include the more ubiquitous action of moving without the ball (GetOpen), such that at any given time, multiple agents are executing learned behaviors simultaneously. We introduce a policy search method for learning GetOpen to complement the temporal difference learning approach employed for learning Pass [4]. The learned GetOpen policy matches the best hand-coded policy for this task, and outperforms the best policy found when Pass is learned. We demonstrate that Pass and GetOpen can be learned simultaneously, and indeed that these learned behaviors specialize towards the counterpart behaviors with which they are trained. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
General Terms:
Keywords:
|
|||||||||||||||||||||||||||||||