| Can exclusive clustering on streaming data be achieved? |
| Full text |
Pdf
(303 KB)
|
| Source
|
ACM SIGKDD Explorations Newsletter
archive
Volume 8 , Issue 2 (December 2006)
table of contents
Pages: 102 - 108
Year of Publication: 2006
ISSN:1931-0145
|
|
Authors
|
|
Maria E. Orlowska
|
The University of Queensland, Brisbane, Australia
|
|
Xingzhi Sun
|
The University of Queensland, Brisbane, Australia
|
|
Xue Li
|
The University of Queensland, Brisbane, Australia
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 54, Citation Count: 0
|
|
|
ABSTRACT
Clustering on streaming data aims at partitioning a list of data points into k groups of "similar" objects by scanning the data once. Most current one-scan clustering algorithms do not keep original data in the resulting clusters. The output of the algorithms is therefore not the clustered data points but the approximations of data properties according to the predefined similarity function, such that k centers and radiuses reflect the up-to-date data grouping. In this paper, we raise a critical question: can the partition-based clustering, or exclusive clustering, be achieved on streaming data by those currently available algorithms? After identifying the differences between traditional clustering and clustering on data streams, we discuss the basic requirements for the clusters that can be discovered from streaming data. We evaluate the recent work that is based on a subcluster maintenance approach. By using a few straightforward examples we illustrate that the subcluster maintenance approach may fail to resolve the exclusive clustering on data streams. Based on our observations, we also present the challenges on any heuristic method that claims solving the clustering problem on data streams in general.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB, pages 81--92, 2003.
|
| |
2
|
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for projected clustering of high dimensional data streams. In VLDB, pages 852--863, 2004.
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
4
|
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SDM, 2006.
|
 |
5
|
|
| |
6
|
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.
|
 |
7
|
|
| |
8
|
|
| |
9
|
J. MacQueen. Some methods for classification and analysis of multivariate observations. In Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297, 1967.
|
| |
10
|
|
| |
11
|
L. O'Callaghan, A. Meyerson, R. Motwani, N. Mishra, and S. Guha. Streaming-data algorithms for high-quality clustering. In ICDE, pages 685-, 2002.
|
 |
12
|
|
 |
13
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
|