ACM Home Page
Please provide us with feedback. Feedback
Combining partitions by probabilistic label aggregation
Full text PdfPdf (252 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 147 - 156  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
Tilman Lange  ETH Zurich, Zurich, Switzerland
Joachim M. Buhmann  ETH Zurich, Zurich, Switzerland
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 51,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081890
What is a DOI?

ABSTRACT

Data clustering represents an important tool in exploratory data analysis. The lack of objective criteria render model selection as well as the identification of robust solutions particularly difficult. The use of a stability assessment and the combination of multiple clustering solutions represents an important ingredient to achieve the goal of finding useful partitions. In this work, we propose a novel way of combining multiple clustering solutions for both, hard and soft partitions: the approach is based on modeling the probability that two objects are grouped together. An efficient EM optimization strategy is employed in order to estimate the model parameters. Our proposal can also be extended in order to emphasize the signal more strongly by weighting individual base clustering solutions according to their consistency with the prediction for previously unseen objects. In addition to that, the probabilistic model supports an out-of-sample extension that (i) makes it possible to assign previously unseen objects to classes of the combined solution and (ii) renders the efficient aggregation of solutions possible. In this work, we also shed some light on the usefulness of such combination approaches. In the experimental result section, we demonstrate the competitive performance of our proposal in comparison with other recently proposed methods for combining multiple classifications of a finite data set.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
S. Ben-David. A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In COLT, pages 415--426, 2004.
 
3
 
4
 
5
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B, 39(1):1--38, 1977.
 
6
I. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph partitioning. Technical report, University of Texas at Austin, 2005.
 
7
S. Dudoit and J. Fridlyand. Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19:1090--1099, 2003.
 
8
 
9
 
10
 
11
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer series in statistics. Springer-Verlag New York, 2001.
 
12
 
13
 
14
E. T. Jaynes. Information theory and statistical mechanics, I and II. Physical Reviews, 106 and 108:620--630 and 171--190, 1957.
 
15
T. Lange, M. Braun, V. Roth, and J. Buhmann. Stability-based model selection. In Advances in Neural Information Processing Systems, volume 15, 2003.
 
16
M. H. C. Law, A. P. Topchy, and A. K. Jain. Multiobjective data clustering. In CVPR (2), pages 424--430, 2004.
 
17
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, volume 13, pages 556--562, 2000.
 
18
F. Leisch. Bagged clustering. Technical report, TU Wien, 1999.
 
19
B. Minaei-Bidgoli, A. P. Topchy, and W. F. Punch. A comparison of resampling methods for clustering ensembles. In IC-AI, pages 939--945, 2004.
 
20
 
21
 
22
 
23
 
24
A. Topchy, A. Jain, and W. Punch. A mixture model for clustering ensembles. In Proc. SIAM Data Mining, pages 379--390, 2004.
 
25
 
26


Collaborative Colleagues:
Tilman Lange: colleagues
Joachim M. Buhmann: colleagues