ACM Home Page
Please provide us with feedback. Feedback
A worst-case comparison between temporal difference and residual gradient with linear function approximation
Full text PdfPdf (344 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 560-567  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Author
Lihong Li  Rutgers University, Piscataway, NJ
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 21,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390227
What is a DOI?

ABSTRACT

Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them except in very limited cases. This paper employs techniques from online learning of linear functions and provides a worst-case (non-probabilistic) analysis to compare these two types of algorithms when linear function approximation is used. No statistical assumptions are made on the sequence of observations, so the analysis applies to non-Markovian and even adversarial domains as well. In particular, our results suggest that RG may result in smaller temporal differences, while TD(0) is more likely to yield smaller prediction errors. These phenomena can be observed even in two simple Markov chain examples that are non-adversarial.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 30--37).
 
2
 
3
Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in Neural Information Processing Systems 7 (NIPS-94) (pp. 369--376).
 
4
Cesa-Bianchi, N., Long, P. M., & Warmuth, M. (1996). Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7, 604--619.
 
5
 
6
 
7
 
8
Munos, R. (2003). Error bounds for approximate policy iteration. Proceedings of the Twentieth International Conference on Machine Learning (ICML-03) (pp. 560--567).
 
9
 
10
 
11
 
12
Schoknecht, R., & Merke, A. (2003). TD(0) converges provably faster than the residual gradient algorithm. Proceedings of the Twentieth International Conference on Machine Learning (ICML-03) (pp. 680--687).
 
13
 
14
 
15
Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactons on Automatic Control, 42, 674--690.