|
ABSTRACT
We consider the problem of clustering data over time. An evolutionary clustering should simultaneously optimize two potentially conflicting criteria: first, the clustering at any point in time should remain faithful to the current data as much as possible; and second, the clustering should not shift dramatically from one timestep to the next. We present a generic framework for this problem, and discuss evolutionary versions of two widely-used clustering algorithms within this framework: k-means and agglomerative hierarchical clustering. We extensively evaluate these algorithms on real data sets and show that our algorithms can simultaneously attain both high accuracy in capturing today's data, and high fidelity in reflecting yesterday's clustering.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proceedings of the International Conference on Very Large Data Bases, pages 852--863, 2003.
|
| |
2
|
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.
|
| |
3
|
|
| |
4
|
D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16, 2004.
|
| |
5
|
C. Chatfield. The Analysis of Time Series. Chapman and Hall, 1984.
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
J. Lin, M. Vlachos, E. Keogh, and D. Gunopulos. Iterative incremental clustering of time series. In Proceedings of the International Conference on Extending Database Technology, pages 106--122, 2004.
|
| |
13
|
M. Meila. Comparing clusterings by the variation of information. In Proceedings of the ACM Conference on Computational Learning Theory, pages 173--187, 2003.
|
| |
14
|
P. Smyth. Clustering sequences with hidden Markov models. In Advances in Neural Information Processing Systems, volume 9, page 648, 1997.
|
 |
15
|
Michail Vlachos , Christopher Meek , Zografoula Vagena , Dimitrios Gunopulos, Identifying similarities, periodicities and bursts for online search queries, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007586]
|
| |
16
|
I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
|
 |
17
|
|
| |
18
|
J. Zhang, Z. Ghahramani, and Y. Yang. A probabilistic model for online document clustering with applications to novelty detection. In Proceedings of Advances in Neural Information Processing Systems, 2005.
|
CITED BY 11
|
|
|
|
|
|
|
|
Yun Chi , Xiaodan Song , Dengyong Zhou , Koji Hino , Belle L. Tseng, Evolutionary spectral clustering by incorporating temporal smoothness, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Yun Chi , Shenghuo Zhu , Xiaodan Song , Junichi Tatemura , Belle L. Tseng, Structural and temporal analysis of the blogosphere through community factorization, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Yu-Ru Lin , Yun Chi , Shenghuo Zhu , Hari Sundaram , Belle L. Tseng, Facetnet: a framework for analyzing communities and their evolutions in dynamic networks, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
A. Scherrer , P. Borgnat , E. Fleury , J. -L. Guillaume , C. Robardet, Description and simulation of dynamic mobility networks, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.52 n.15, p.2842-2858, October, 2008
|
|
|
Xintian Yang , Sitaram Asur , Srinivasan Parthasarathy , Sameep Mehta, A visual-analytic toolkit for dynamic interaction graphs, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Lei Tang , Huan Liu , Jianping Zhang , Zohreh Nazeri, Community evolution in dynamic multi-mode networks, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Xiuyao Song , Chris Jermaine , Sanjay Ranka , John Gums, A bayesian mixture model with linear regression mixing proportions, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
Yu-Ru Lin , Jimeng Sun , Paul Castro , Ravi Konuru , Hari Sundaram , Aisling Kelliher, MetaFac: community discovery via relational hypergraph factorization, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|