| An analytic solution to discrete Bayesian reinforcement learning |
| Full text |
Pdf
(213 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 148
archive
Proceedings of the 23rd international conference on Machine learning
table of contents
Pittsburgh, Pennsylvania
Pages: 697 - 704
Year of Publication: 2006
ISBN:1-59593-383-2
|
|
Authors
|
|
Pascal Poupart
|
University of Waterloo, Waterloo, Ontario, Canada
|
|
Nikos Vlassis
|
University of Amsterdam, Amsterdam, The Netherlands
|
|
Jesse Hoey
|
University of Toronto, Toronto, Ontario, Canada
|
|
Kevin Regan
|
University of Waterloo, Waterloo, Ontario, Canada
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 5
|
|
|
ABSTRACT
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient point-based value iteration algorithm that exploits this simple parameterization.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Boger, J., Poupart, P., Hoey, J., Boutilier, C., Fernie, G., & Mihailidis, A. (2005). A decision-theoretic approach to task assistance for persons with dementia. IJCAI (pp. 1293--1299).
|
| |
2
|
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. NIPS (pp. 1017--1023).
|
| |
3
|
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. UAI (pp. 150--159).
|
| |
4
|
|
| |
5
|
DeGroot, M. H. (1970). Optimal statistical decisions. New York: McGraw-Hill.
|
| |
6
|
|
| |
7
|
Duff, M. (2003). Design for an optimal probe. ICML (pp. 131--138).
|
| |
8
|
|
| |
9
|
|
| |
10
|
Ng, A., Kim, H. J., Jordan, M., & Sastry, S. (2003). Autonomous helicopter flight via reinforcement learning. NIPS.
|
| |
11
|
Porta, J. M., Spaan, M. T., & Vlassis, N. (2005). Robot planning in partially observable continuous domains. Proc. Robotics: Science and Systems.
|
| |
12
|
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21, 1071--1088.
|
| |
13
|
Spaan, M. T. J., & Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24, 195--220.
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
Tao Wang , Daniel Lizotte , Michael Bowling , Dale Schuurmans, Bayesian sparse sampling for on-line reward optimization, Proceedings of the 22nd international conference on Machine learning, p.956-963, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102472]
|
|