| Fully distributed EM for very large datasets |
| Full text |
Pdf
(507 KB)
|
| Source
|
ICML; Vol. 307
archive
Proceedings of the 25th international conference on Machine learning
table of contents
Helsinki, Finland
Pages 1184-1191
Year of Publication: 2008
ISBN:978-1-60558-205-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 101, Citation Count: 1
|
|
|
ABSTRACT
In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of the parameters in a single node for the M-step can be impractical. We present a framework that fully distributes the entire EM procedure. Each node interacts only with parameters relevant to its data, sending messages to other nodes along a junction-tree topology. We demonstrate improvements over a MapReduce topology, on two tasks: word alignment and topic modeling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Chu, C.-T., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A. Y., & Olukotum, K. (2006). Map-Reduce for Machine Learning on Multicore. NIPS.
|
| |
4
|
|
| |
5
|
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society.
|
| |
6
|
|
| |
7
|
Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2008). Distributed Inference for Latent Dirichlet Allocation. NIPS.
|
| |
8
|
Nowak, R. (2003). Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Transactions on Signal Processing, 51, 2245--2253.
|
| |
9
|
|
| |
10
|
|
|