|
ABSTRACT
Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult - if not impossible - to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach - called NPPG - that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Aberdeen, D. (2006). Policy-gradient methods for planning. Advances in Neural Information Processing Systems 18 (pp. 9--17).
|
| |
2
|
|
| |
3
|
Bagnell, J., & Schneider, J. (2003). Policy search in reproducing kernel hilbert space (Technical Report CMU-RI-TR-03-45). Robotics Institute, Carnegie Mellon University, Pittsburg, Pa, USA.
|
| |
4
|
Baxter, J., Bartlett, P., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Arificial Intellifence Research (JAIR), 15, 351--381.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth.
|
| |
8
|
Chapman, D., & Kaelbling, L. P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisions. Proceedings of the 12th International Joint Conference on Artificial Intelligence (pp. 726--731) Sydney, Australia.
|
 |
9
|
Thomas G. Dietterich , Adam Ashenfelter , Yaroslav Bulatov, Training conditional random fields via gradient tree boosting, Proceedings of the twenty-first international conference on Machine learning, p.28, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015428]
|
 |
10
|
|
| |
11
|
|
| |
12
|
Driessens, K., & Ramon, J. (2003). Relational instance based regression for relational reinforcement learning. Proceedings of the 20th International Conference on Machine Learning (pp. 123--130) Washington, DC, USA.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189--1232.
|
| |
17
|
Gärtner, T., Driessens, K., & Ramon, J. (2003). Graph kernels and gaussian processes for relational reinforcement learning. International Conference on Inductive Logic Programming (pp. 146--163) Szeged, Hungary.
|
| |
18
|
|
| |
19
|
|
| |
20
|
Gutmann, B., & Kersting, K. (2006). TildeCRF: Conditional random fields for logical sequences. Proceedings of the 17th European Conference on Machine Learning (pp. 174--185). Berlin, Germany.
|
 |
21
|
Kristian Kersting , Martijn Van Otterlo , Luc De Raedt, Bellman goes relational, Proceedings of the twenty-first international conference on Machine learning, p.59, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015401]
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Riedmiller, M. (2005). Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method. Proceedings of the 16th European Conference on Machine Learning (pp. 317--328) Porto, Portugal.
|
| |
26
|
Sanner, S., & Boutilier, C. (2005). Approximate linear programming for first-order MDPs. Proceedings of the 21st conference on Uncertainty in AI (UAI) (pp. 509--517) Edinburgh, Scotland.
|
| |
27
|
|
| |
28
|
|
| |
29
|
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12 (pp. 1057--1063). MIT Press.
|
| |
30
|
|
| |
31
|
Wang, C., Joshi, S., & Khardon, R. (2007). First order decision diagrams for relational mdps. Proceedings of the 20th International Joint Conference on Artificial Intelligence (pp. 1095--1100). Hyderabad, India: AAAI press.
|
| |
32
|
Wang, X., & Dietterich, T. (2003). Model-based policy gradient reinforcement learning. Proceedings of the 20th International Conference on Machine Learning (pp. 776--783) Washington, DC, USA.
|
| |
33
|
|
|