ACM Home Page
Please provide us with feedback. Feedback
A semiparametric statistical approach to model-free policy evaluation
Full text PdfPdf (286 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 1072-1079  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Tsuyoshi Ueno  Kyoto University, Kyoto, Japan
Motoaki Kawanabe  Fraunhofer FIRST, IDA, Berlin, Germany
Takeshi Mori  Kyoto University, Kyoto, Japan
Shin-ichi Maeda  Kyoto University, Kyoto, Japan
Shin Ishii  Kyoto University, Kyoto, Japan
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 28,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390291
What is a DOI?

ABSTRACT

Reinforcement learning (RL) methods based on least-squares temporal difference (LSTD) have been developed recently and have shown good practical performance. However, the quality of their estimation has not been well elucidated. In this article, we discuss LSTD-based policy evaluation from the new view-point of semiparametric statistical inference. In fact, the estimator can be obtained from a particular estimating function which guarantees its convergence to the true value asymptotically, without specifying a model of the environment. Based on these observations, we 1) analyze the asymptotic variance of an LSTD-based estimator, 2) derive the optimal estimating function with the minimum asymptotic estimation variance, and 3) derive a suboptimal estimator to reduce the computational burden in obtaining the optimal estimating function.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Amari, S., & Kawanabe, M. (1997). Information geometry of estimating functions in semi-parametric statistical models. Bernoulli, 3, 29--54.
 
2
 
3
Bickel, D., Ritov, D., Klaassen, C., & Wellner, J. (1998). Efficient and Adaptive Estimation for Semi-parametric Models. Springer.
 
4
 
5
Godambe, V. (1985). The foundations of finite sample estimation in stochastic processes. Biometrika, 72, 419--428.
 
6
Godambe, V. (Ed.). (1991). Estimating Functions. Oxford Science.
 
7
 
8
 
9
 
10
 
11
Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural actor-critic. Proceedings of the 16th European Conference on Machine Learning (pp. 280--291).
 
12
 
13

Collaborative Colleagues:
Tsuyoshi Ueno: colleagues
Motoaki Kawanabe: colleagues
Takeshi Mori: colleagues
Shin-ichi Maeda: colleagues
Shin Ishii: colleagues