ACM Home Page
Please provide us with feedback. Feedback
Model-free reinforcement learning as mixture learning
Full text PdfPdf (750 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 1081-1088  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Nikos Vlassis  Technical University of Crete, Chania, Greece
Marc Toussaint  TU Berlin, Berlin, Germany
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 30,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553512
What is a DOI?

ABSTRACT

We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abbeel, P., Coates, A., Quigley, M., & Y., N. A. (2007). An application of reinforcement learning to aerobatic helicopter flight. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19, 1--8. Cambridge, MA: MIT Press.
 
2
 
3
Celeux, G., & Diebolt, J. (1985). The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comp. Statis. Quaterly, 2, 73--82.
 
4
Cooper, G. F. (1988). A method for using belief networks as influence diagrams. Proc. 4th Workshop on Uncertainty in Artificial Intelligence (pp. 55--63). Minneapolis, Minnesota, USA.
 
5
 
6
 
7
Delyon, B., Lavielle, M., & Moulines, E. (1999). Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics, 27, 94--128.
 
8
Gordon, G. (1996). Chattering in Sarsa(λ) (Technical Report). CMU Learning Lab internal report.
 
9
Hansen, E. (1998). Solving POMDPs by searching in policy space. Proc. 14th Int. Conf. on Uncertainty in Artificial Intelligence (pp. 211--219). Madison, Wisconsin, USA.
 
10
Hoffman, M., Doucet, A., De Freitas, N., & Jasra, A. (2008). Bayesian policy learning with transdimensional MCMC. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20, 665--672. Cambridge, MA: MIT Press.
 
11
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In Advances in neural information processing systems 7, 345--352. MIT Press.
 
12
Kober, J., & Peters, J. (2009). Policy search for motor primitives in robotics. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in neural information processing systems 21, 849--856.
 
13
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. Proc. 12th Int. Conf. on Machine Learning (pp. 362--370).
 
14
15
 
16
 
17
 
18
Perkins, T. J., & Precup, D. (2003). A convergent form of approximate policy iteration. In S. T. S. Becker and K. Obermayer (Eds.), Advances in neural information processing systems 15, 1595--1602. Cambridge, MA: MIT Press.
 
19
 
20
 
21
Shani, G., Brafman, R. I., & Shimony, S. E. (2007). Forward search value iteration for POMDPs. In Int. Joint Conf. on Artificial Intelligence (pp. 2619--2624).
 
22
23
 
24
 
25
Wei, G., & Tanner, M. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm. J. Amer. Statist. Assocation, 85, 699--704.

Collaborative Colleagues:
Nikos Vlassis: colleagues
Marc Toussaint: colleagues