ACM Home Page
Please provide us with feedback. Feedback
Kernelized value function approximation for reinforcement learning
Full text PdfPdf (1.45 MB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 1017-1024  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Gavin Taylor  Duke University, Durham, NC
Ronald Parr  Duke University, Durham, NC
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 24,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553504
What is a DOI?

ABSTRACT

A recent surge in research in kernelized approaches to reinforcement learning has sought to bring the benefits of kernelized machine learning techniques to reinforcement learning. Kernelized reinforcement learning techniques are fairly new and different authors have approached the topic with different assumptions and goals. Neither a unifying view nor an understanding of the pros and cons of different approaches has yet emerged. In this paper, we offer a unifying view of the different approaches to kernelized value function approximation for reinforcement learning. We show that, except for different approaches to regularization, Kernelized LSTD (KLSTD) is equivalent to a modelbased approach that uses kernelized regression to find an approximate reward and transition model, and that Gaussian Process Temporal Difference learning (GPTD) returns a mean value function that is equivalent to these other approaches. We also discuss the relationship between our modelbased approach and the earlier Gaussian Processes in Reinforcement Learning (GPRL). Finally, we decompose the Bellman error into the sum of transition error and reward error terms, and demonstrate through experiments that this decomposition can be helpful in choosing regularization parameters.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Bagnell, J. A. D., & Schneider, J. (2003). Policy Search in Reproducing Kernel Hilbert Space (Technical Report CMU-RI-TR-03-45). Robotics Institute, Pittsburgh, PA.
 
2
Bertsekas, D. P., & Castanon, D. A. (1989). Adaptive Aggregation Methods for Infinite Horizon Dynamic Programming. IEEE Transactions on Automatic Control (pp. 589--598).
 
3
 
4
 
5
6
 
7
Farahmand, A. M., Ghavamzadeh, M., Szepesvari, C., & Mannor, S. (2008). Regularized Policy Iteration. Advances in Neural Information Processing Systems (pp. 441--448).
 
8
Girard, A., Rasmussen, C. E., Candela, J. Q., & Murray-Smith, R. (2003). Gaussian Process Priors with Uncertain Inputs-Application to Multiple-Step Ahead Time Series Forecasting. Advances in Neural Information Processing Systems (pp. 545--552).
 
9
Lagoudakis, M. G., & Parr, R. (2003). Reinforcement Learning as Classification: Leveraging Modern Classifiers. Proceedings of the Twentieth International Conference on Machine Learning (pp. 424--431).
 
10
Mahadevan, S., & Maggioni, M. (2006). Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes (Technical Report). University of Massachusetts.
11
 
12
Rasmussen, C. E., & Kuss, M. (2004). Gaussian Processes in Reinforcement Learning. Advances in Neural Information Processing Systems (pp. 751--759).
 
13
Xu, X., Hu, D., & Lu, X. (2007). Kernel-Based Least Squares Policy Iteration for Reinforcement Learning. IEEE Transactions on Neural Networks (pp. 973--992).
 
14
Xu, X., Xie, T., Hu, D., & Lu, X. (2005). Kernel Least-Squares Temporal Difference Learning. International Journal of Information Technology (pp. 54--63).

Collaborative Colleagues:
Gavin Taylor: colleagues
Ronald Parr: colleagues