| Analyzing feature generation for value-function approximation |
| Full text |
Pdf
(291 KB)
|
| Source
|
ICML; Vol. 227
archive
Proceedings of the 24th international conference on Machine learning
table of contents
Corvalis, Oregon
Pages: 737 - 744
Year of Publication: 2007
ISBN:978-1-59593-793-3
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 32, Citation Count: 4
|
|
|
ABSTRACT
We analyze a simple, Bellman-error-based approach to generating basis functions for value-function approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in Neural Information Processing Systems 7 (pp. 369--376). Cambridge, MA: The MIT Press.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Mahadevan, S., & Maggioni, M. (2006). Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes (Technical Report 2006--35). University of Massachusetts, Amherst.
|
| |
9
|
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41.
|
| |
10
|
Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134.
|
| |
11
|
Munos, R. (2003). Error bounds for approximate policy iteration. Proceedings of the Twentieth International Conference on Machine Learning.
|
| |
12
|
|
| |
13
|
|
| |
14
|
Vapnik, V., Golowich, S., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems 9 (pp. 281--287). Cambridge, MA: MIT Press.
|
| |
15
|
Yu, H., & Bertsekas, D. (2006). Convergence results for some temporal difference methods based on least squares (Technical Report LIDS-2697). Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.
|
CITED BY 4
|
|
Ronald Parr , Lihong Li , Gavin Taylor , Christopher Painter-Wakefield , Michael L. Littman, An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning, Proceedings of the 25th international conference on Machine learning, p.752-759, July 05-09, 2008, Helsinki, Finland
|
|
|
|
|
|
|
|
|
|
|