| Fast collapsed gibbs sampling for latent dirichlet allocation |
| Full text |
Pdf
(210 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 569-577
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Ian Porteous
|
University of California Irvine, Irvine, CA, USA
|
|
David Newman
|
University of California Irvine, Irvine, CA, USA
|
|
Alexander Ihler
|
University of California Irvine, Irvine, CA, USA
|
|
Arthur Asuncion
|
University of California Irvine, Irvine, CA, USA
|
|
Padhraic Smyth
|
University of California Irvine, Irvine, CA, USA
|
|
Max Welling
|
University of California Irvine, Irvine, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 32, Downloads (12 Months): 306, Citation Count: 1
|
|
|
ABSTRACT
In this paper we introduce a novel collapsed Gibbs sampling method for the widely used latent Dirichlet allocation (LDA) model. Our new method results in significant speedups on real world text corpora. Conventional Gibbs sampling schemes for LDA require O(K) operations per sample where K is the number of topics in the model. Our proposed method draws equivalent samples but requires on average significantly less then K operations per sample. On real-word corpora FastLDA can be as much as 8 times faster than the standard collapsed Gibbs sampler for LDA. No approximations are necessary, and we show that our fast sampling scheme produces exactly the same results as the standard (but slower) sampling scheme. Experiments on four real world data sets demonstrate speedups for a wide range of collection sizes. For the PubMed collection of over 8 million documents with a required computation time of 6 CPU months for LDA, our speedup of 5.7 can save 5 CPU months of computation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Alsabti, S. Ranka, and V. Singh. An efficient k-means clustering algorithm. Workshop on High-Performance Data Mining at IPPS/SPDP, Mar. 1998.
|
 |
2
|
|
| |
3
|
|
| |
4
|
Wray Buntine , Jaakko Lofstrom , Jukka Perkio , Sami Perttu , Vladimir Poroshin , Tomi Silander , Henry Tirri , Antti Tuominen , Ville Tuulos, A Scalable Topic-Based Open Source Search Engine, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, p.228-234, September 20-24, 2004
[doi> 10.1109/WI.2004.12]
|
| |
5
|
C. Chemudugunta, P. Smyth, , and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Neural Information Processing Systems 19. MIT Press, 2006.
|
| |
6
|
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228--5235, April 2004.
|
| |
7
|
G. Hardy, J. E. Littlewood, and G. Polya. Inequalities. Cambridge University Press, 1989.
|
| |
8
|
A. T. Ihler, E. B. Sudderth, W. T. Freeman, and A. S. Willsky. Efficient multiscale sampling from products of Gaussian mixtures. In Proc. Neural Information Processing Systems (NIPS) 17, Dec. 2003.
|
| |
9
|
K. Kurihara and M. Welling. Bayesian k-means as a maximization-expectation. In Neural Computation, accepted.
|
| |
10
|
K. Kurihara, M. Welling, and N. Vlassis. Accelerated variational dirichlet process mixtures. In NIPS, volume 19, 2006.
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In Proc. Neural Information Processing Systems (NIPS) 22, dec 2007.
|
 |
15
|
David Newman , Kat Hagedorn , Chaitanya Chemudugunta , Padhraic Smyth, Subject metadata enrichment using statistical topic models, Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
[doi> 10.1145/1255175.1255248]
|
 |
16
|
|
| |
17
|
|
| |
18
|
Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. In NIPS, volume 17, 2004.
|
 |
19
|
|
|