ACM Home Page
Please provide us with feedback. Feedback
An analytic solution to discrete Bayesian reinforcement learning
Full text PdfPdf (213 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 697 - 704  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Pascal Poupart  University of Waterloo, Waterloo, Ontario, Canada
Nikos Vlassis  University of Amsterdam, Amsterdam, The Netherlands
Jesse Hoey  University of Toronto, Toronto, Ontario, Canada
Kevin Regan  University of Waterloo, Waterloo, Ontario, Canada
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143932
What is a DOI?

ABSTRACT

Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G., & Mihailidis, A. (2005). A decision-theoretic approach to task assistance for persons with dementia. IJCAI (pp. 1293--1299).
 
2
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. NIPS (pp. 1017--1023).
 
3
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. UAI (pp. 150--159).
 
4
 
5
DeGroot, M. H. (1970). Optimal statistical decisions. New York: McGraw-Hill.
 
6
 
7
Duff, M. (2003). Design for an optimal probe. ICML (pp. 131--138).
 
8
 
9
 
10
Ng, A., Kim, H. J., Jordan, M., & Sastry, S. (2003). Autonomous helicopter flight via reinforcement learning. NIPS.
 
11
Porta, J. M., Spaan, M. T., & Vlassis, N. (2005). Robot planning in partially observable continuous domains. Proc. Robotics: Science and Systems.
 
12
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071--1088.
 
13
Spaan, M. T. J., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195--220.
 
14
 
15
16
17


Collaborative Colleagues:
Pascal Poupart: colleagues
Nikos Vlassis: colleagues
Jesse Hoey: colleagues
Kevin Regan: colleagues