ACM Home Page
Please provide us with feedback. Feedback
Fast gradient-descent methods for temporal-difference learning with linear function approximation
Full text PdfPdf (832 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 993-1000  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Richard S. Sutton  University of Alberta, Edmonton, Canada
Hamid Reza Maei  University of Alberta, Edmonton, Canada
Doina Precup  McGill University, Montreal, Canada
Shalabh Bhatnagar  Indian Institute of Science, Bangalore, India
David Silver  University of Alberta, Edmonton, Canada
Csaba Szepesvári  University of Alberta, Edmonton, Canada
Eric Wiewiora  University of Alberta, Edmonton, Canada
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 38,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553501
What is a DOI?

ABSTRACT

Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their gradient temporal difference (GTD) algorithm converges reliably, it can be very slow compared to conventional linear TD (on on-policy problems where TD is convergent), calling into question its practical utility. In this paper we introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD. This algorithm appears to extend linear TD to off-policy learning with no penalty in performance while only doubling computational requirements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th Int. Conf. on Machine Learning, pp. 30--37.
 
3
Baird, L. C. (1999). Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie-Mellon University.
 
4
Barnard, E. (1993). Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics 23(2):357--365.
 
5
 
6
 
7
 
8
 
9
 
10
Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings AAAI, pp. 356--361.
 
11
 
12
 
13
Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S. (2006). Off-policy learning with recognizers. Advances in Neural Information Processing Systems 18.
 
14
Silver, D., Sutton, R. S., Müller, M. (2007). Reinforcement learning of local shape in the game of Go. Proceedings of the 20th IJCAI, pp. 1053--1058.
 
15
Sturtevant, N. R., White, A. M. (2006). Feature construction for reinforcement learning in hearts. In Proceedings of the 5th International Conf. on Computers and Games.
 
16
 
17
 
18
Sutton, R. S., Szepesvári, Cs., Maei, H. R. (2009). A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. Advances in Neural Information Processing Systems 21.

Collaborative Colleagues:
Richard S. Sutton: colleagues
Hamid Reza Maei: colleagues
Doina Precup: colleagues
Shalabh Bhatnagar: colleagues
David Silver: colleagues
Csaba Szepesvári: colleagues
Eric Wiewiora: colleagues