ACM Home Page
Please provide us with feedback. Feedback
Analyzing feature generation for value-function approximation
Full text PdfPdf (291 KB)
Source ICML; Vol. 227 archive
Proceedings of the 24th international conference on Machine learning table of contents
Corvalis, Oregon
Pages: 737 - 744  
Year of Publication: 2007
ISBN:978-1-59593-793-3
Authors
Ronald Parr  Duke University, Durham, NC
Christopher Painter-Wakefield  Duke University, Durham, NC
Lihong Li  Rutgers University, Piscataway, NJ
Michael Littman  Rutgers University, Piscataway, NJ
Sponsor
: Machine Learning Journal
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 32,   Citation Count: 4
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1273496.1273589
What is a DOI?

ABSTRACT

We analyze a simple, Bellman-error-based approach to generating basis functions for value-function approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in Neural Information Processing Systems 7 (pp. 369--376). Cambridge, MA: The MIT Press.
 
3
 
4
5
 
6
 
7
 
8
Mahadevan, S., & Maggioni, M. (2006). Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes (Technical Report 2006--35). University of Massachusetts, Amherst.
 
9
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41.
 
10
Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134.
 
11
Munos, R. (2003). Error bounds for approximate policy iteration. Proceedings of the Twentieth International Conference on Machine Learning.
 
12
 
13
 
14
Vapnik, V., Golowich, S., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems 9 (pp. 281--287). Cambridge, MA: MIT Press.
 
15
Yu, H., & Bertsekas, D. (2006). Convergence results for some temporal difference methods based on least squares (Technical Report LIDS-2697). Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.

Collaborative Colleagues:
Ronald Parr: colleagues
Christopher Painter-Wakefield: colleagues
Lihong Li: colleagues
Michael Littman: colleagues