|
ABSTRACT
Cluster ensembles offer a solution to challenges inherent to clustering arising from its ill-posed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this article, we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus functions make use of the weight vectors associated with the clusters. We demonstrate the effectiveness of our techniques by running experiments with several real datasets, including high-dimensional text data. Furthermore, we investigate in depth the issue of diversity and accuracy for our ensemble methods. Our analysis and experimental results show that the proposed techniques are capable of producing a partition that is as good as or better than the best individual clustering.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Al-Razgan, M. and Domeniconi, C. 2006. Weighted clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 258--269.
|
| |
2
|
Asuncion, A. and Newman, D. 2007. UCI Machine Learning Repository. http://www.ics.uci.edu/~mlearn/MLR/epository.html.
|
| |
3
|
Ayad, H. and Kamel, M. 2003. Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors. In Proceedings of the International Workshop on Multiple Classifier Systems. 166--175.
|
 |
4
|
|
| |
5
|
Carlotta Domeniconi , Dimitrios Gunopulos , Sheng Ma , Bojun Yan , Muna Al-Razgan , Dimitris Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, v.14 n.1, p.63-97, February 2007
[doi> 10.1007/s10618-006-0060-8]
|
| |
6
|
Domeniconi, C., Papadopoulos, D., Gunopulos, D., and Ma, S. 2004. Subspace clustering of high-dimensional data. In Proceedings of the SIAM International Conference on Data Mining. 517--520.
|
| |
7
|
Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 9, 1090--1099.
|
| |
8
|
Fern, X. and Brodley, C. 2003. Random projection for high-dimensional data clustering: A cluster ensemble approach. In Proceedings of the International Conference on Machine Learning. 63--74.
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals Math. Statist. 22, 1, 79--86.
|
| |
19
|
Kuncheva, L. and Hadjitodorov, S. 2004. Using diversity in cluster ensembles. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. Vol. 2. 1214--1219.
|
| |
20
|
Kuncheva, L. I., Hadjitodorov, S. T., and Todorova, L. P. 2006. Experimental comparison of cluster ensemble methods. In Proceedings of the International Conference on Information Fusion. 1--7.
|
| |
21
|
Mangasarian, O. L. and Wolberg, W. H. 1990. Cancer diagnosis via linear programming. SIAM News 23, 5, 1--18.
|
| |
22
|
Minaei-Bidgoli, B., Topchy, A., and Punch, W. 2004. A comparison of resampling methods for clustering ensembles. In Proceedings of the International Conference on Machine Learning: Models, Technologies and Applications. 939--945.
|
| |
23
|
Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems. Vol. 14. 849--856.
|
 |
24
|
|
| |
25
|
Pekalska, E. 2005. The dissimilariy representations in pattern recognition. concepts, theory and applications. Ph.D. thesis, Delft University of Technology, Delft.
|
| |
26
|
Punera, K. and Ghosh, J. 2007. Soft cluster ensembles. In Advances in Fuzzy Clustering and its Applications, J. V. de Oliveira and W. Pedrycz, Eds. John Wiley & Sons, Ltd., 69--90.
|
| |
27
|
|
| |
28
|
|
| |
29
|
Topchy, A., Jain, A., and Punch, W. 2004. A mixture model for clustering ensembles. In Proceedings of the SIAM International Conference on Data Mining. 379--390.
|
| |
30
|
|
|