ACM Home Page
Please provide us with feedback. Feedback
Region-based value iteration for partially observable Markov decision processes
Full text PdfPdf (279 KB)
Source ACM International Conference Proceeding Series; Vol. 148 archive
Proceedings of the 23rd international conference on Machine learning table of contents
Pittsburgh, Pennsylvania
Pages: 561 - 568  
Year of Publication: 2006
ISBN:1-59593-383-2
Authors
Hui Li  Duke University, Durham, NC
Xuejun Liao  Duke University, Durham, NC
Lawrence Carin  Duke University, Durham, NC
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 14,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1143844.1143915
What is a DOI?

ABSTRACT

An approximate region-based value iteration (RBVI) algorithm is proposed to find the optimal policy for a partially observable Markov decision process (POMDP). The proposed RBVI approximates the true polyhedral partition of the belief simplex with an ellipsoidal partition, such that the optimal value function is linear in each of the ellipsoidal regions. The position and shape of each region, as well as the gradient (alpha-vector) of the optimal value function in the region, are parameterized explicitly, and are estimated via efficient expectation maximization (EM) and variational Bayesian EM (VBEM), based on a set of selected sample belief points. The RBVI maintains a much smaller number of alpha-vectors than point-based methods and yields a more parsimonious representation that approximates the true value function in the maximum likelihood (ML) sense. The results on benchmark problems show that the proposed RBVI is comparable in performance to state-of-the-art algorithms, despite of the small number of alpha-vectors that are used.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Brafman, R. I. (1997). A heuristic variable grid solution method for pomdps. AAAI (pp. 727--733).
 
2
 
3
Corduneanu, A., & Bishop, C. (2001). Variational Bayesian model selection for mixture distributions. AI and Statistics (pp. 27--34).
 
4
 
5
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially obsevable environments:scaling up. ICML (pp. 362--370).
 
6
 
7
Pineau, J., Gordon, G., & Thrun, S. (2003). Point-based value iteration: An anytime algorithm for POMDPs. IJ-CAI (pp. 1025 -- 1032).
 
8
Poon, K.-M. (2001). A fast heuristic algorithm for decision-theoretic planning. Master's thesis, The Hong Kong University of Science and Technology.
 
9
Poupart, P., & Boutilier, C. (2003). Bounded finite state controllers. NIPS.
 
10
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operational Research, 21, 1071--1088.
 
11
 
12
Smith, T., & Simmons, R. (2005). Point-based POMDP algorithms: Improved analysis and implementation. Proc. of UAI.
 
13
Sondik, E. J. (1971). The optimal control of partially observable Markov processes. Doctoral dissertation, Stanford University.
 
14
Spaan, M., & Vlassis, N. (2004). A point-based POMDP algorithm for robot planning. Proc. IEEE Int. Conf. on Robotics and Automation (ICRA) (pp. 2399--2404).


Collaborative Colleagues:
Hui Li: colleagues
Xuejun Liao: colleagues
Lawrence Carin: colleagues