ACM Home Page
Please provide us with feedback. Feedback
A stochastic memoizer for sequence data
Full text PdfPdf (679 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 1129-1136  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Frank Wood  University College London, London, UK
Cédric Archambeau  University College London, London, UK
Jan Gasthaus  University College London, London, UK
Lancelot James  Hong Kong University of Science and Technology, Kowloon, Hong Kong
Yee Whye Teh  University College London, London, UK
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 41,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553518
What is a DOI?

ABSTRACT

We propose an unbounded-depth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes well. The model builds on a specific parameterization of an unbounded-depth hierarchical Pitman-Yor process. We introduce analytic marginalization steps (using coagulation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators necessary to do predictive inference. We demonstrate the sequence memoizer by using it as a language model, achieving state-of-the-art results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Cleary, J. G. & Teahan, W. J. (1997). Unbounded length contexts for PPM. The Computer Journal, 40, 67--75.
 
3
Goodman, N. D., Mansinghka, V. K., Roy, D., Bonawitz, K., & Tenenbaum, J. B. (2008). Church: a language for generative models. In Uncertainty and Artificial Intelligence. to appear.
 
4
Ho, M. W., James, L. F., & Lau, J. W. (2006). Coagulation fragmentation laws induced by general coagulations of two-parameter Poisson-Dirichlet processes. http://arxiv.org/abs/math.PR/0601608.
 
5
Ishwaran, H. & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of American Statistical Association, 96(453), 161--173.
 
6
Michie, D. (1968). Memo functions and machine learning. Nature, 218, 19--22.
 
7
Mnih, A. & Hinton, G. (2009). A scalable hierarchical distributed language model. In Neural Information Processing Systems 22. to appear.
 
8
Mochihashi, D. & Sumita, E. (2008). The infinite Markov model. In Advances in Neural Information Processing Systems 20, (pp. 1017--1024).
 
9
Perman, M. (1990). Random Discrete Distributions Derived from Subordinators. PhD thesis, Department of Statistics, University of California at Berkeley.
 
10
Pitman, J. (1999). Coalescents with multiple collisions. Annals of Probability, 27, 1870--1902.
 
11
Pitman, J. & Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25, 855--900.
 
12
Sudderth, E. B. & Jordan, M. I. (2009). Shared segmentation of natural scenes using dependent pitman-yor processes. In Neural Information Processing Systems 22. to appear.
 
13
 
14
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566--1581.
 
15
Ukkonen, E. (1995). On-line construction of suffix trees. Algorithmica, 14, 249--260.
 
16
 
17
Wood, F. & Teh, Y. W. (2009). A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation. In Journal of Machine Learning, Workshop and Conference Proceedings: Artificial Intelligence in Statistics 2009, volume 5, (pp. 607--614).

Collaborative Colleagues:
Frank Wood: colleagues
Cédric Archambeau: colleagues
Jan Gasthaus: colleagues
Lancelot James: colleagues
Yee Whye Teh: colleagues