ACM Home Page
Please provide us with feedback. Feedback
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
Full text PdfPdf (289 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 752-759  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Ronald Parr  Duke University, Durham, NC
Lihong Li  Rutgers University, Piscataway, NJ
Gavin Taylor  Duke University, Durham, NC
Christopher Painter-Wakefield  Duke University, Durham, NC
Michael L. Littman  Rutgers University, Piscataway, NJ
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 52,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390251
What is a DOI?

ABSTRACT

We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Dean, T., & Givan, R. (1997). Model minimization in Markov decision processes. AAAI-97.
4
 
5
 
6
 
7
 
8
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. on Signal Processing, 41.
9
 
10
Petrik, M. (2007). An analysis of Laplacian methods for value function approximation in MDPs. IJCAI-07.
 
11
Sanner, S., & Boutilier, C. (2005). Approximate linear programming for first-order MDPs. UAI-05.
 
12
 
13
 
14
Wu, J.-H., & Givan, R. (2004). Feature-discovering approximate value iteration methods (Technical Report TR-ECE-04-06). Purdue University.
 
15
Yu, H., & Bertsekas, D. (2006). Convergence results for some temporal difference methods based on least squares (Technical Report LIDS-2697). MIT.


Collaborative Colleagues:
Ronald Parr: colleagues
Lihong Li: colleagues
Gavin Taylor: colleagues
Christopher Painter-Wakefield: colleagues
Michael L. Littman: colleagues