| Online kernel selection for Bayesian reinforcement learning |
| Full text |
Pdf
(709 KB)
|
| Source
|
ICML; Vol. 307
archive
Proceedings of the 25th international conference on Machine learning
table of contents
Helsinki, Finland
Pages: 816-823
Year of Publication: 2008
ISBN:978-1-60558-205-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 62, Citation Count: 0
|
|
|
ABSTRACT
Kernel-based Bayesian methods for Reinforcement Learning (RL) such as Gaussian Process Temporal Difference (GPTD) are particularly promising because they rigorously treat uncertainty in the value function and make it easy to specify prior knowledge. However, the choice of prior distribution significantly affects the empirical performance of the learning agent, and little work has been done extending existing methods for prior model selection to the online setting. This paper develops Replacing-Kernel RL, an online model selection method for GPTD using sequential Monte-Carlo methods. Replacing-Kernel RL is compared to standard GPTD and tile-coding on several RL domains, and is shown to yield significantly better asymptotic performance for many different kernel families. Furthermore, the resulting kernels capture an intuitively useful notion of prior state covariance that may nevertheless be difficult to capture manually.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Engel, Y. (2005). Algorithms and representations for reinforcement learning. Doctoral dissertation, Hebrew University.
|
 |
3
|
|
| |
4
|
|
| |
5
|
Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel approach to nonlinear/non-gaussian bayesian state estimation. Radar and Signal Processing, IEE Proceedings F, 140, 107--113.
|
| |
6
|
Hastie, T., Tibshirani, R., & Friedman, J. H. (2001). The elements of statistical learning. Springer.
|
| |
7
|
Jung, T., & Polani, D. (2006). Least squares svm for least squares td learning. ECAI (pp. 499--503). IOS Press.
|
| |
8
|
Leffler, B. R., Littman, M. L., & Edmunds, T. (2007). Efficient reinforcement learning with relocatable action models. Proc. of AAAI-07 (pp. 572--577). Menlo Park, CA, USA: The AAAI Press.
|
| |
9
|
Loth, M., Davy, M., & Preux, P. (2007). Sparse temporal difference learning using lasso. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. Hawaii, USA.
|
| |
10
|
|
| |
11
|
Seeger, M. (2001). Covariance kernels from bayesian generative models. NIPS (pp. 905--912). MIT Press.
|
| |
12
|
|
| |
13
|
|
| |
14
|
White, A. (2007). The University of Alberta Reinforcement Learning Library. http://rlai.cs.ualberta.ca/RLR/. Edmonton, Alberta: University of Alberta.
|
| |
15
|
|
|