| An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning |
| Full text |
Pdf
(289 KB)
|
| Source
|
ICML; Vol. 307
archive
Proceedings of the 25th international conference on Machine learning
table of contents
Helsinki, Finland
Pages 752-759
Year of Publication: 2008
ISBN:978-1-60558-205-4
|
|
Authors
|
|
Ronald Parr
|
Duke University, Durham, NC
|
|
Lihong Li
|
Rutgers University, Piscataway, NJ
|
|
Gavin Taylor
|
Duke University, Durham, NC
|
|
Christopher Painter-Wakefield
|
Duke University, Durham, NC
|
|
Michael L. Littman
|
Rutgers University, Piscataway, NJ
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 57, Citation Count: 2
|
|
|
ABSTRACT
We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Dean, T., & Givan, R. (1997). Model minimization in Markov decision processes. AAAI-97.
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. on Signal Processing, 41.
|
 |
9
|
Ronald Parr , Christopher Painter-Wakefield , Lihong Li , Michael Littman, Analyzing feature generation for value-function approximation, Proceedings of the 24th international conference on Machine learning, p.737-744, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273589]
|
| |
10
|
Petrik, M. (2007). An analysis of Laplacian methods for value function approximation in MDPs. IJCAI-07.
|
| |
11
|
Sanner, S., & Boutilier, C. (2005). Approximate linear programming for first-order MDPs. UAI-05.
|
| |
12
|
|
| |
13
|
|
| |
14
|
Wu, J.-H., & Givan, R. (2004). Feature-discovering approximate value iteration methods (Technical Report TR-ECE-04-06). Purdue University.
|
| |
15
|
Yu, H., & Bertsekas, D. (2006). Convergence results for some temporal difference methods based on least squares (Technical Report LIDS-2697). MIT.
|
|