| Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search |
| Full text |
Pdf
(762 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 401-408
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 23, Citation Count: 0
|
|
|
ABSTRACT
Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. We add an adaptive uncertainty handling based on Hoeffding and empirical Bernstein races to the CMA-ES, a variable metric evolution strategy proposed for direct policy search. The uncertainty handling adjusts individually the number of episodes considered for the evaluation of a policy. The performance estimation is kept just accurate enough for a sufficiently good ranking of candidate policies, which is in turn sufficient for the CMA-ES to find better solutions. This increases the learning speed as well as the robustness of the algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Beyer, H.-G. (2007). Evolution strategies. Scholarpedia, 2, 1965.
|
| |
3
|
|
| |
4
|
Coulom, R. (2002). Apprentissage par renforcement utilisant des reseaux de neurones, avec des applications au controle moteur. These de doctorat, Institut National Polytechnique de Grenoble.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Hansen, N., Niederberger, A. S. P., Guzzella, L., & Koumoutsakos, P. (2009). A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion. IEEE Transactions on Evolutionary Computation, 13, 180--197.
|
| |
8
|
Verena Heidrich-Meisner , Christian Igel, Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem, Recent Advances in Reinforcement Learning: 8th European Workshop, EWRL 2008, Villeneuve d'Ascq, France, June 30-July 3, 2008, Revised and Selected Papers, Springer-Verlag, Berlin, Heidelberg, 2008
[doi> 10.1007/978-3-540-89722-4_11]
|
| |
9
|
Heidrich-Meisner, V., & Igel, C. (2009). Uncertainty handling CMA-ES for reinforcement learning. Genetic and Evolutionary Computation Conference (GECCO 2009). ACM Press.
|
| |
10
|
Maron, O., & Moore, A. W. (1994). Hoeffding races: Accelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems (pp. 59--66). Morgan Kaufmann Publishers.
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
Schmidt, C., Branke, J., & Chick, S. (2006). Integrating techniques from statistical ranking into evolutionary algorithms. Applications of Evolutionary Computing (pp. 752--763). Springer-Verlag.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
Yuan, B., & Gallagher, M. (2004). Statistical racing techniques for improved empirical evaluation of evolutionary algorithms. Parallel Problem Solving from Nature (PPSN VIII) (pp. 172--181). Springer-Verlag.
|
|