ACM Home Page
Please provide us with feedback. Feedback
Fast collapsed gibbs sampling for latent dirichlet allocation
Full text PdfPdf (210 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 569-577  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Ian Porteous  University of California Irvine, Irvine, CA, USA
David Newman  University of California Irvine, Irvine, CA, USA
Alexander Ihler  University of California Irvine, Irvine, CA, USA
Arthur Asuncion  University of California Irvine, Irvine, CA, USA
Padhraic Smyth  University of California Irvine, Irvine, CA, USA
Max Welling  University of California Irvine, Irvine, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 32,   Downloads (12 Months): 306,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401960
What is a DOI?

ABSTRACT

In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
K. Alsabti, S. Ranka, and V. Singh. An efficient k-means clustering algorithm. Workshop on High-Performance Data Mining at IPPS/SPDP, Mar. 1998.
2
 
3
 
4
 
5
C. Chemudugunta, P. Smyth, , and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Neural Information Processing Systems 19. MIT Press, 2006.
 
6
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228--5235, April 2004.
 
7
G. Hardy, J. E. Littlewood, and G. Polya. Inequalities. Cambridge University Press, 1989.
 
8
A. T. Ihler, E. B. Sudderth, W. T. Freeman, and A. S. Willsky. Efficient multiscale sampling from products of Gaussian mixtures. In Proc. Neural Information Processing Systems (NIPS) 17, Dec. 2003.
 
9
K. Kurihara and M. Welling. Bayesian k-means as a maximization-expectation. In Neural Computation, accepted.
 
10
K. Kurihara, M. Welling, and N. Vlassis. Accelerated variational dirichlet process mixtures. In NIPS, volume 19, 2006.
11
12
 
13
 
14
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In Proc. Neural Information Processing Systems (NIPS) 22, dec 2007.
15
16
 
17
 
18
Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. In NIPS, volume 17, 2004.
19


Collaborative Colleagues:
Ian Porteous: colleagues
David Newman: colleagues
Alexander Ihler: colleagues
Arthur Asuncion: colleagues
Padhraic Smyth: colleagues
Max Welling: colleagues