|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ABSTRACT
Traditionally, research in the reinforcement learning (RL) community has been devoted to developing domain-independent algorithms such as SARSA [13], Q-learning [16], prioritized sweeping [8], or LSPI [6], that are designed to work for any given state space and action space. However, the modus operandi in RL research has been for a human expert to re-code each learning environment, including defining the actions and state features, as well as specifying the algorithm to be used. Typically each new RL experiment is run by explicitly calling a new program (even when learning can be biased by previous learning experiences, as in transfer learning [10, 15, 14]). Thus, while standards have developed for describing and testing individual RL algorithms (e.g., RL-Glue [17]), no such standards have developed for the problem of describing complete tasks to a preexisting agent. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
General Terms:
Keywords:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||