ACM Home Page
Please provide us with feedback. Feedback
Sigma point policy iteration
Full text PdfPdf (1.53 MB)
Source
International Conference on Autonomous Agents archive
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1 table of contents
Estoril, Portugal
SESSION: Agent and multi-agent learning table of contents
Pages 379-386  
Year of Publication: 2008
ISBN:978-0-9817381-0-9
Authors
Michael Bowling  University of Alberta, Edmonton, AB
Alborz Geramifard  University of Alberta, Edmonton, AB
David Wingate  University of Michigan, Ann Arbor, MI
Sponsors
ACM: Association for Computing Machinery
AAAI : Association for the Advancement of Artifical Intelligence
Publisher
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 38,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear value function approximation. These algorithms rely on policy-dependent expectations of the transition and reward functions, which require all experience to be remembered and iterated over for each new policy evaluated. We propose to summarize experience with a compact policy-independent Gaussian model. We show how this policy-independent model can be transformed into a policy-dependent form and used to perform policy evaluation. Because closed-form transformations are rarely available, we introduce an efficient sigma point approximation. We show that the resulting Sigma-Point Policy Iteration algorithm (SPPI) is mathematically equivalent to LSPI for tabular representations and empirically demonstrate comparable performance for approximate representations. However, the experience does not need to be saved or replayed, meaning that for even moderate amounts of experience, SPPI is an order of magnitude faster than LSPI.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
N. J. Higham. Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications, 103:103--118, 1988.
 
4
S. Julier and J. K. Uhlmann. A general method for approximating nonlinear transformations of probability distributions. Technical report, University of Oxford, 1996.
 
5
 
6
 
7
 
8
 
9
 
10
R. van der Merwe and E. A. Wan. The square-root unscented Kalman filter for state and parameter-estimation. In International Conference on Acoustics, Speech, and Signal Processing, 2001.
 
11
H. O. Wang, K. Tanaka, and M. F. Griffin. An approach to fuzzy control of non-linear systems: Stability and design issues. IEEE Transactions on Fuzzy Systems, 4(1):14--23, 1996.

Collaborative Colleagues:
Michael Bowling: colleagues
Alborz Geramifard: colleagues
David Wingate: colleagues