ACM Home Page
Please provide us with feedback. Feedback
Fully distributed EM for very large datasets
Full text PdfPdf (507 KB)
Source ICML; Vol. 307 archive
Proceedings of the 25th international conference on Machine learning table of contents
Helsinki, Finland
Pages 1184-1191  
Year of Publication: 2008
ISBN:978-1-60558-205-4
Authors
Jason Wolfe  University of California, Berkeley, CA
Aria Haghighi  University of California, Berkeley, CA
Dan Klein  University of California, Berkeley, CA
Sponsors
: Yahoo!
: Xerox
IBM : IBM
: NSF
Microsoft Research : Microsoft Research
: Machine Learning Journal/Springer
: Pascal
: University of Helsinki
: Federation of Finnish Learned Societies
: Intel Corporation
: Google
: Helsinki Institute for Information Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 101,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390156.1390305
What is a DOI?

ABSTRACT

In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework that fully distributes the entire EM procedure. Each node interacts only with parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce topology, on two tasks: word alignment and topic modeling.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Chu, C.-T., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A. Y., & Olukotum, K. (2006). Map-Reduce for Machine Learning on Multicore. NIPS.
 
4
 
5
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society.
 
6
 
7
Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2008). Distributed Inference for Latent Dirichlet Allocation. NIPS.
 
8
Nowak, R. (2003). Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Transactions on Signal Processing, 51, 2245--2253.
 
9
 
10


Collaborative Colleagues:
Jason Wolfe: colleagues
Aria Haghighi: colleagues
Dan Klein: colleagues