ACM Home Page
Please provide us with feedback. Feedback
Evolutionary clustering
Full text PdfPdf (834 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
POSTER SESSION: Research track posters table of contents
Pages: 554 - 560  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Deepayan Chakrabarti  Yahoo! Research, Sunnyvale, CA
Ravi Kumar  Yahoo! Research, Sunnyvale, CA
Andrew Tomkins  Yahoo! Research, Sunnyvale, CA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150467
What is a DOI?

ABSTRACT

We consider the problem of clustering data over time. An evolutionary clustering should simultaneously optimize two potentially conflicting criteria: first, the clustering at any point in time should remain faithful to the current data as much as possible; and second, the clustering should not shift dramatically from one timestep to the next. We present a generic framework for this problem, and discuss evolutionary versions of two widely-used clustering algorithms within this framework: k-means and agglomerative hierarchical clustering. We extensively evaluate these algorithms on real data sets and show that our algorithms can simultaneously attain both high accuracy in capturing today's data, and high fidelity in reflecting yesterday's clustering.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proceedings of the International Conference on Very Large Data Bases, pages 852--863, 2003.
 
2
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.
 
3
 
4
D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, 2004.
 
5
C. Chatfield. The Analysis of Time Series. Chapman and Hall, 1984.
6
 
7
 
8
 
9
 
10
 
11
 
12
J. Lin, M. Vlachos, E. Keogh, and D. Gunopulos. Iterative incremental clustering of time series. In Proceedings of the International Conference on Extending Database Technology, pages 106--122, 2004.
 
13
M. Meila. Comparing clusterings by the variation of information. In Proceedings of the ACM Conference on Computational Learning Theory, pages 173--187, 2003.
 
14
P. Smyth. Clustering sequences with hidden Markov models. In Advances in Neural Information Processing Systems, volume 9, page 648, 1997.
15
 
16
I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
17
 
18
J. Zhang, Z. Ghahramani, and Y. Yang. A probabilistic model for online document clustering with applications to novelty detection. In Proceedings of Advances in Neural Information Processing Systems, 2005.

CITED BY  11

Collaborative Colleagues:
Deepayan Chakrabarti: colleagues
Ravi Kumar: colleagues
Andrew Tomkins: colleagues