ACM Home Page
Please provide us with feedback. Feedback
Non-parametric policy gradients: a unified treatment of propositional and relational domains
Full text PdfPdf (553 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 456-463  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Kristian Kersting  Fraunhofer IAIS, Sankt Augustin, Germany
Kurt Driessens  Katholieke Universiteit Leuven, Heverlee, Belgium
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 27,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390214
What is a DOI?

ABSTRACT

Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult - if not impossible - to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach - called NPPG - that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aberdeen, D. (2006). Policy-gradient methods for planning. Advances in Neural Information Processing Systems 18 (pp. 9--17).
 
2
 
3
Bagnell, J., & Schneider, J. (2003). Policy search in reproducing kernel hilbert space (Technical Report CMU-RI-TR-03-45). Robotics Institute, Carnegie Mellon University, Pittsburg, Pa, USA.
 
4
Baxter, J., Bartlett, P., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Arificial Intellifence Research (JAIR), 15, 351--381.
 
5
 
6
 
7
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth.
 
8
Chapman, D., & Kaelbling, L. P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisions. Proceedings of the 12th International Joint Conference on Artificial Intelligence (pp. 726--731) Sydney, Australia.
9
10
 
11
 
12
Driessens, K., & Ramon, J. (2003). Relational instance based regression for relational reinforcement learning. Proceedings of the 20th International Conference on Machine Learning (pp. 123--130) Washington, DC, USA.
 
13
 
14
 
15
 
16
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189--1232.
 
17
Gärtner, T., Driessens, K., & Ramon, J. (2003). Graph kernels and gaussian processes for relational reinforcement learning. International Conference on Inductive Logic Programming (pp. 146--163) Szeged, Hungary.
 
18
 
19
 
20
Gutmann, B., & Kersting, K. (2006). TildeCRF: Conditional random fields for logical sequences. Proceedings of the 17th European Conference on Machine Learning (pp. 174--185). Berlin, Germany.
21
 
22
 
23
 
24
 
25
Riedmiller, M. (2005). Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method. Proceedings of the 16th European Conference on Machine Learning (pp. 317--328) Porto, Portugal.
 
26
Sanner, S., & Boutilier, C. (2005). Approximate linear programming for first-order MDPs. Proceedings of the 21st conference on Uncertainty in AI (UAI) (pp. 509--517) Edinburgh, Scotland.
 
27
 
28
 
29
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12 (pp. 1057--1063). MIT Press.
 
30
 
31
Wang, C., Joshi, S., & Khardon, R. (2007). First order decision diagrams for relational mdps. Proceedings of the 20th International Joint Conference on Artificial Intelligence (pp. 1095--1100). Hyderabad, India: AAAI press.
 
32
Wang, X., & Dietterich, T. (2003). Model-based policy gradient reinforcement learning. Proceedings of the 20th International Conference on Machine Learning (pp. 776--783) Washington, DC, USA.
 
33

Collaborative Colleagues:
Kristian Kersting: colleagues
Kurt Driessens: colleagues