|
|||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||
ABSTRACT
Learning in multiagent systems is generally slow because the agent has to extract its correct policy through not only through its interaction with the environment, but also from its interactions with other learning agents. In this paper, we present an approach that significantly improves the learning speed in multiagent systems by allowing an agent to up-date its estimate of the rewards for all its available actions, not just the action that was taken. Our results show that the rewards on such "actions not taken" are beneficial early in training, particularly when agent teams are leveraged to estimate those rewards. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
General Terms:
|
|||||||||||||||||||||||||||||||