ACM Home Page
Please provide us with feedback. Feedback
Utile distinction hidden Markov models
Full text PdfPdf (178 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 108  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Daan Wierstra  Utrecht University, The Netherlands
Marco Wiering  Utrecht University, The Netherlands
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 28,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015346
What is a DOI?

ABSTRACT

This paper addresses the problem of constructing good action selection policies for agents acting in partially observable environments, a class of problems generally known as Partially Observable Markov Decision Processes. We present a novel approach that uses a modification of the well-known Baum-Welch algorithm for learning a Hidden Markov Model (HMM) to predict both percepts and utility in a non-deterministic world. This enables an agent to make decisions based on its previous history of actions, observations, and rewards. Our algorithm, called Utile Distinction Hidden Markov Models (UDHMM), handles the creation of memory well in that it tends to create perceptual and utility distinctions only when needed, while it can still discriminate states based on histories of arbitrary length. The experimental results in highly stochastic problem domains show very good performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aberdeen, D. (2003). Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. Doctoral dissertation, Research School of Information Science and Engineering, Australian National University.
 
2
Bakker, B. (2004). The State of Mind. Doctoral dissertation, Unit of Cognitive Psychology, Leiden University.
 
3
Bengio, Y., & Frasconi, P. (1995). An input/output HMM architecture. In G. Tesauro & D. Touretzky & T. Leen (Ed.), Advances in Neural Information Processing Systems 7 (pp. 427--434). Cambridge, MA: MIT Press.
 
4
Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings of the Tenth International Conference on Artificial Intelligence (pp. 183--188). San Jose, California: AAAI Press.
 
5
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237--285.
 
6
 
7
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. Proceedings of the Twelfth International Conference on Machine Learning (pp. 362--370). San Francisco: Morgan Kaufmann.
 
8
 
9
McCallum, R. A. (1993). Overcoming incomplete perception with utile distinction memory. The Proceedings of the Tenth International Conference on Machine Learning (pp. 190--196). San Francisco: Morgan Kaufmann.
 
10
McCallum, R. A. (1995a). Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State. The Proceedings of the Twelfth International Conference on Machine Learning (pp. 387--395).
 
11
 
12
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2) (pp. 257--286).
 
13
 
14

Collaborative Colleagues:
Daan Wierstra: colleagues
Marco Wiering: colleagues