ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Incremental and effective data summarization for dynamic hierarchical clustering
Full text PdfPdf (235 KB)
Source International Conference on Management of Data archive
Proceedings of the 2004 ACM SIGMOD international conference on Management of data table of contents
Paris, France
SESSION: Research sessions: clustering table of contents
Pages: 467 - 478  
Year of Publication: 2004
ISBN:1-58113-859-8
Authors
Samer Nassar  University of Alberta, Edmonton, Alberta, Canada
Jörg Sander  University of Alberta, Edmonton, Alberta, Canada
Corrine Cheng  University of Alberta, Edmonton, Alberta, Canada
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 129,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007568.1007621
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

Mining informative patterns from very large, dynamically changing databases poses numerous interesting challenges. Data summarizations (e.g., data bubbles) have been proposed to compress very large static databases into representative points suitable for subsequent effective hierarchical cluster analysis. In many real world applications, however, the databases dynamically change due to frequent insertions and deletions, possibly changing the data distribution and clustering structure over time. Completely reapplying both the data summarization and the clustering algorithm to detect the changes in the clustering structure and update the uncovered data patterns following such deletions and insertions is prohibitively expensive for large fast changing databases. In this paper, we propose a new scheme to maintain data bubbles incrementally. By using incremental data bubbles, a high-quality hierarchical clustering is quickly available at any point in time. In our scheme, a quality measure for incremental data bubbles is used to identify data bubbles that do not compress well their underlying data points after certain insertions and deletions. Only these data bubbles are re-built using efficient split and merge operations. An extensive experimental evaluation shows that the incremental data bubbles provide significantly faster data summarization than completely re-building the data bubbles after a certain number of insertions and deletions, and are effective in preserving (and in some cases even improving) the quality of the data summarization.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aggarwal, C. C., Han, J., Wang, J., Yu. P. S. A Framework for Clustering Evolving Data Streams. In VLDB'03, 81--92, 2003.
2
 
3
4
5
6
 
7
 
8
Elkan, C. Using the Triangle Inequality to Accelerate k-means. In ICML'03, 2003.
 
9
Ester, M., Kriegel, H-P., Sander, J., Xu, X. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD'96, 226--231, 1996.
 
10
 
11
12
13
 
14
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In 5th Berkeley Symp. Math. Statist. Prob., 281--297, 1967.
 
15
 
16
Sander, J., Qin, X., Lu. Z., Niu, N. Kovarsky, A. Automated Extraction of Clusters from Hierarchical Clustering Representations. PAKDD'03.
 
17
Sibson, R. SLINK: An Optimally Efficient Algorithm for the Single-link Cluster Method. The Computer Journal, 16(1):30--34, 1973.
 
18
Triola, M. F., Goodman, W. M., Law, R. Elementary Statistics First Canadian Edition, Addison-Wesley, Don Mills, Ontario, 1999.
 
19
20


Collaborative Colleagues:
Samer Nassar: colleagues
Jörg Sander: colleagues
Corrine Cheng: colleagues