ACM Home Page
Please provide us with feedback. Feedback
Uncertainty handling CMA-ES for reinforcement learning
Full text PdfPdf (533 KB)
Source
Genetic And Evolutionary Computation Conference archive
Proceedings of the 11th Annual conference on Genetic and evolutionary computation table of contents
Montreal, Québec, Canada
SESSION: Track 11: genetics-based machine learning table of contents
Pages 1211-1218  
Year of Publication: 2009
ISBN:978-1-60558-325-9
Authors
Verena Heidrich-Meisner  Ruhr-Universität, Bochum, Germany
Christian Igel  Ruhr-Universität, Bochum, Germany
Sponsors
SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 34,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1569901.1570064
What is a DOI?

ABSTRACT

The covariance matrix adaptation evolution strategy (CMAES) has proven to be a powerful method for reinforcement learning (RL). Recently, the CMA-ES has been augmented with an adaptive uncertainty handling mechanism. Because uncertainty is a typical property of RL problems this new algorithm, termed UH-CMA-ES, is promising for RL. The UH-CMA-ES dynamically adjusts the number of episodes considered in each evaluation of a policy. It controls the signal to noise ratio such that it is just high enough for a sufficiently good ranking of candidate policies, which in turn allows the evolutionary learning to find better solutions. This significantly increases the learning speed as well as the robustness without impairing the quality of the final solutions. We evaluate the UH-CMA-ES on fully and partially observable Markov decision processes with random start states and noisy observations. A canonical natural policy gradient method and random search serve as a baseline for comparison.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Amari and H. Nagaoka. Methods of Information Geometry. Number 191 in Translations of Mathematical Monographs. American Mathematical Society and Oxford University Press, 2000.
 
2
 
3
H.-G. Beyer. Evolution strategies. Scholarpedia, 2(8):1965, 2007.
 
4
C. Chen and E. Yucesan. An alternative simulation budget allocation scheme for efficient simulation. International Journal of Simulation and Process Modeling, 1(1):49--57, 2005.
 
5
R. Coulom. Apprentissage par renforcement utilisant des reseaux de neurones, avec des applications au controle moteur. These de doctorat, Institut National Polytechnique de Grenoble, 2002.
 
6
 
7
 
8
 
9
N. Hansen, A. S. P. Niederberger, L. Guzzella, and P. Koumoutsakos. Evolutionary optimization of feedback controllers for thermoacoustic instabilities. In J. F. Morrison, D. M. Birch, and P. Lavoie, editors, IUTAM Symposium on Flow Control and MEMS. Springer-Verlag, 2008.
 
10
N. Hansen, A. S. P. Niederberger, L. Guzzella, and P. Koumoutsakos. A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion. IEEE Transactions on Evolutionary Computation, 13(1):180--197, 2009.
 
11
 
12
 
13
V. Heidrich-Meisner and C. Igel. Uncertainty handling in evolutionary direct policy search. In Y. Engel, M. Ghavamzadeh, P. Poupart, and S. Mannor, editors, NIPS-08 Workshop on Model Uncertainty and Risk in Reinforcement Learning. 2008.
 
14
 
15
C. Igel. Neuroevolution for reinforcement learning using evolution strategies. In Congress on Evolutionary Computation (CEC 2003), volume4, pages 2588--2595. IEEE Press, 2003.
 
16
 
17
 
18
S. Kakade. A natural policy gradient. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems (NIPS14). MIT Press, 2002.
 
19
 
20
 
21
J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics. In Proc. 3rd IEEE-RAS Int 'l Conf. on Humanoid Robots, pages 29--30, 2003.
 
22
M. Riedmiller, J. Peters, and S. Schaal. Evaluation of policy gradient methods and variants on the cart-pole benchmark. In Proc. IEEE Int'l Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), pages 254--261, 2007.
 
23
C. Schmidt, J. Branke, and S. Chick. Integrating techniques from statistical ranking into evolutionary algorithms. In Applications of Evolutionary Computing, volume 3907 of LNCS, pages 752--763. Springer, 2006.
 
24
 
25
 
26
R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, volume 12, pages 1057--1063, 2000.
 
27
 
28

Collaborative Colleagues:
Verena Heidrich-Meisner: colleagues
Christian Igel: colleagues