ACM Home Page
Please provide us with feedback. Feedback
Online kernel selection for Bayesian reinforcement learning
Full text PdfPdf (709 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 816-823  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Joseph Reisinger  The University of Texas at Austin, Austin, TX
Peter Stone  The University of Texas at Austin, Austin, TX
Risto Miikkulainen  The University of Texas at Austin, Austin, TX
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 44,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390259
What is a DOI?

ABSTRACT

Kernel-based Bayesian methods for Reinforcement Learning (RL) such as Gaussian Process Temporal Difference (GPTD) are particularly promising because they rigorously treat uncertainty in the value function and make it easy to specify prior knowledge. However, the choice of prior distribution significantly affects the empirical performance of the learning agent, and little work has been done extending existing methods for prior model selection to the online setting. This paper develops Replacing-Kernel RL, an online model selection method for GPTD using sequential Monte-Carlo methods. Replacing-Kernel RL is compared to standard GPTD and tile-coding on several RL domains, and is shown to yield significantly better asymptotic performance for many different kernel families. Furthermore, the resulting kernels capture an intuitively useful notion of prior state covariance that may nevertheless be difficult to capture manually.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Engel, Y. (2005). Algorithms and representations for reinforcement learning. Doctoral dissertation, Hebrew University.
3
 
4
 
5
Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel approach to nonlinear/non-gaussian bayesian state estimation. Radar and Signal Processing, IEE Proceedings F, 140, 107--113.
 
6
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning. Springer.
 
7
Jung, T., & Polani, D. (2006). Least squares svm for least squares td learning. ECAI (pp. 499--503). IOS Press.
 
8
Leffler, B. R., Littman, M. L., & Edmunds, T. (2007). Efficient reinforcement learning with relocatable action models. Proc. of AAAI-07 (pp. 572--577). Menlo Park, CA, USA: The AAAI Press.
 
9
Loth, M., Davy, M., & Preux, P. (2007). Sparse temporal difference learning using lasso. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Hawaii, USA.
 
10
 
11
Seeger, M. (2001). Covariance kernels from bayesian generative models. NIPS (pp. 905--912). MIT Press.
 
12
 
13
 
14
White, A. (2007). The University of Alberta Reinforcement Learning Library. http://rlai.cs.ualberta.ca/RLR/. Edmonton, Alberta: University of Alberta.
 
15

Collaborative Colleagues:
Joseph Reisinger: colleagues
Peter Stone: colleagues
Risto Miikkulainen: colleagues