ACM Home Page
Please provide us with feedback. Feedback
FREM: fast and robust EM clustering for large data sets
Full text PdfPdf (201 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eleventh international conference on Information and knowledge management table of contents
McLean, Virginia, USA
SESSION: Clustering algorithms table of contents
Pages: 590 - 599  
Year of Publication: 2002
ISBN:1-58113-492-4
Authors
Carlos Ordonez  Teradata, a division of NCR, San Diego, CA
Edward Omiecinski  Georgia Institute of Technology, Atlanta, GA
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 56,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/584792.584889
What is a DOI?

ABSTRACT

Clustering is a fundamental Data Mining technique. This article presents an improved EM algorithm to cluster large data sets having high dimensionality, noise and zero variance problems. The algorithm incorporates improvements to increase the quality of solutions and speed. In general the algorithm can find a good clustering solution in 3 scans over the data set. Alternatively, it can be run until it converges. The algorithm has a few parameters that are easy to set and have defaults for most cases. The proposed algorithm is compared against the standard EM algorithm and the On-Line EM algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In ACM KDD Conference, 1998.
 
6
P. Bradley, U. Fayyad, and C. Reina. Scaling EM clustering to large databases. Technical report, Microsoft Research, 1999.
7
 
8
A.P. Dempster, N.M. Laird, and D. Rubin. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of The Royal Statistical Society, 39(1):1--38, 1977.
 
9
R. Dubes and A.K. Jain. Clustering Methodologies in Exploratory Data Analysis, pages 10--35. Academic Press, New York, 1980.
 
10
11
12
13
14
 
15
S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In ICDE Conference, 1999.
 
16
 
17
 
18
 
19
 
20
G.J. MacLachlan and T. Krishnan. The EM Algorithm and Extensions, pages 120--211. Wiley, New York, 1997.
 
21
 
22
R. Neal and G. Hinton. A view of the EM algorithm that justifies incremental, sparse and other variants. Technical report, Dept. of Statistics, University of Toronto, 1993.
 
23
 
24
25
 
26
 
27
28
 
29
R.A. Redner and H.F. Walker. Mixure densities, maximum likelihood, and the EM algorithm. SIAM Review, 26:195--239, 1984.
 
30
 
31
 
32
D. Scott. Multivariate Density Estimation, pages 10--130. J. Wiley and Sons, New York, 1992.
 
33
 
34
Lei Xu , Michael I. Jordan, On convergence properties of the EM algorithm for Gaussian mixtures, Neural Computation, v.8 n.1, p.129-151, Jan. 1996
 
35
36

CITED BY  11

Collaborative Colleagues:
Carlos Ordonez: colleagues
Edward Omiecinski: colleagues