ACM Home Page
Please provide us with feedback. Feedback
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search
Full text PdfPdf (762 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 401-408  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Verena Heidrich-Meisner  Institut für Neuroinformatik, Bochum, Germany
Christian Igel  Institut für Neuroinformatik, Bochum, Germany
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 23,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553426
What is a DOI?

ABSTRACT

Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. We add an adaptive uncertainty handling based on Hoeffding and empirical Bernstein races to the CMA-ES, a variable metric evolution strategy proposed for direct policy search. The uncertainty handling adjusts individually the number of episodes considered for the evaluation of a policy. The performance estimation is kept just accurate enough for a sufficiently good ranking of candidate policies, which is in turn sufficient for the CMA-ES to find better solutions. This increases the learning speed as well as the robustness of the algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Beyer, H.-G. (2007). Evolution strategies. Scholarpedia, 2, 1965.
 
3
 
4
Coulom, R. (2002). Apprentissage par renforcement utilisant des reseaux de neurones, avec des applications au controle moteur. These de doctorat, Institut National Polytechnique de Grenoble.
 
5
 
6
 
7
Hansen, N., Niederberger, A. S. P., Guzzella, L., & Koumoutsakos, P. (2009). A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion. IEEE Transactions on Evolutionary Computation, 13, 180--197.
 
8
 
9
Heidrich-Meisner, V., & Igel, C. (2009). Uncertainty handling CMA-ES for reinforcement learning. Genetic and Evolutionary Computation Conference (GECCO 2009). ACM Press.
 
10
Maron, O., & Moore, A. W. (1994). Hoeffding races: Accelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems (pp. 59--66). Morgan Kaufmann Publishers.
 
11
12
 
13
 
14
 
15
Schmidt, C., Branke, J., & Chick, S. (2006). Integrating techniques from statistical ranking into evolutionary algorithms. Applications of Evolutionary Computing (pp. 752--763). Springer-Verlag.
 
16
 
17
 
18
 
19
 
20
 
21
Yuan, B., & Gallagher, M. (2004). Statistical racing techniques for improved empirical evaluation of evolutionary algorithms. Parallel Problem Solving from Nature (PPSN VIII) (pp. 172--181). Springer-Verlag.

Collaborative Colleagues:
Verena Heidrich-Meisner: colleagues
Christian Igel: colleagues