ACM Home Page
Please provide us with feedback. Feedback
Clustering binary data streams with K-means
Full text PdfPdf (150 KB)
Source Data Mining And Knowledge Discovery archive
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery table of contents
San Diego, California
SESSION: Data streams I table of contents
Pages: 12 - 19  
Year of Publication: 2003
Author
Carlos Ordonez  Teradata, a division of NCR, San Diego, CA
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 261,   Citation Count: 18
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/882082.882087
What is a DOI?

ABSTRACT

Clustering data streams is an interesting Data Mining problem. This article presents three variants of the K-means algorithm to cluster binary data streams. The variants include On-line K-means, Scalable K-means, and Incremental K-means, a proposed variant introduced that finds higher quality solutions in less time. Higher quality of solutions are obtained with a mean-based initialization and incremental learning. The speedup is achieved through a simplified set of sufficient statistics and operations with sparse matrices. A summary table of clusters is maintained on-line. The K-means variants are compared with respect to quality of results and speed. The proposed algorithms can be used to monitor transactions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
 
6
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In ACM KDD Conference, 1998.
7
 
8
R. Dubes and A. K. Jain. Clustering Methodologies in Exploratory Data Analysis, pages 10--35. Academic Press, New York, 1980.
 
9
R. Duda and P. Hart. Pattern Classification and Scene Analysis, pages 10--45. J. Wiley and Sons, 1973.
10
 
11
12
 
13
14
 
15
S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In ICDE Conference, 1999.
16
 
17
 
18
 
19
J. B. MacQueen. Some method for classification and analysis of multivariate observations. In Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.
 
20
 
21
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. Streaming-data algorithms for high quality clustering. In IEEE ICDE, 2002.
22
 
23
 
24
 
25
26

CITED BY  19