ACM Home Page
Please provide us with feedback. Feedback
Memory bounded inference in topic models
Full text PdfPdf (317 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 344-351  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Ryan Gomes  California Institute of Technology, Pasadena, CA
Max Welling  University of California at Irvine, Irvine, CA
Pietro Perona  California Institute of Technology, Pasadena, CA
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 25,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390200
What is a DOI?

ABSTRACT

What type of algorithms and statistical techniques support learning from very large datasets over long stretches of time? We address this question through a memory bounded version of a variational EM algorithm that approximates inference in a topic model. The algorithm alternates two phases: "model building" and "model compression" in order to always satisfy a given memory constraint. The model building phase expands its internal representation (the number of topics) as more data arrives through Bayesian model selection. Compression is achieved by merging data-items in clumps and only caching their sufficient statistics. Empirically, the resulting algorithm is able to handle datasets that are orders of magnitude larger than the standard batch version.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Ferguson, T. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1, 209--230.
 
4
Kurihara, K., Welling, M., & Vlassis, N. (2006). Accelerated variational dirichlet process mixtures. NIPS.
 
5
 
6
Minka, T. (2000). Estimating a dirichlet distribution (Technical Report).
 
7
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. To appear in Journal of the American Statistical Association.
 
8
Teh, Y. W., Kurihara, K., & Welling, M. (2008). Collapsed variational inference for HDP. Advances in Neural Information Processing Systems.
 
9
Ueda, N., Nakano, R., Gharamani, Z., & Hinton, G. (1999). Smem algorithm for mixture models.
 
10
Verbeek, J., Nunnink, J., & Vlassis, N. (2003). Accelerated variants of the em algorithm for gaussian mixtures (Technical Report). University of Amsterdam.

Collaborative Colleagues:
Ryan Gomes: colleagues
Max Welling: colleagues
Pietro Perona: colleagues