ACM Home Page
Please provide us with feedback. Feedback
Improving clustering stability with combinatorial MRFs
Full text MovMov (26:58),  PdfPdf (584 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 99-108  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Ron Bekkerman  HP Labs, Palo Alto, CA, USA
Martin Scholz  HP Labs, Palo Alto, CA, USA
Krishnamurthy Viswanathan  HP Labs, Palo Alto, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 140,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557037
What is a DOI?

ABSTRACT

As clustering methods are often sensitive to parameter tuning, obtaining stability in clustering results is an important task. In this work, we aim at improving clustering stability by attempting to diminish the influence of algorithmic inconsistencies and enhance the signal that comes from the data. We propose a mechanism that takes m clusterings as input and outputs m clusterings of comparable quality, which are in higher agreement with each other. We call our method the Clustering Agreement Process (CAP). To preserve the clustering quality, CAP uses the same optimization procedure as used in clustering. In particular, we study the stability problem of randomized clustering methods (which usually produce different results at each run). We focus on methods that are based on inference in a combinatorial Markov Random Field (or Comraf, for short) of a simple topology. We instantiate CAP as inference within a more complex, bipartite Comraf. We test the resulting system on four datasets, three of which are medium-sized text collections, while the fourth is a large-scale user/movie dataset. First, in all the four cases, our system significantly improves the clustering stability measured in terms of the macro-averaged Jaccard index. Second, in all the four cases our system managed to significantly improve clustering quality as well, achieving the state-of-the-art results. Third, our system significantly improves stability of consensus clustering built on top of the randomized clustering solutions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
L. Badea. Clustering and metaclustering with nonnegative matrix decompositions. In Proceedings of ECML-16, pages 10--22, 2005.
2
 
3
R. Bekkerman and J. Jeon. Multi-modal clustering for multimedia collections. In Proceedings of CVPR'07, the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007.
 
4
R. Bekkerman, M. Sahami, and E. Learned-Miller. Combinatorial Markov Random Fields. In Proceedings of ECML-17, 2006.
5
 
6
S. Ben-David, U. von Luxburg, and D. Pál. A sober look at clustering stability. In Proceedings of COLT-19, pages 5--19, 2006.
 
7
A. Ben-Hur, A. Elisseeff, and I. Guyon. A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing, pages 6--17, 2002.
 
8
J. Besag. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, 36(2):192--236, 1974.
 
9
J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, 48(3), 1986.
10
11
 
12
P. Jaccard. The distribution of flora in the alpine zone. New Phytologist, 11:37--50, 1912.
 
13
A. Kreiger and P. Green. A generalized rand-index method for consensus clustering of separate partitions of the same data base. Journal of Classification, 16:63--89, 1999.
 
14
 
15
 
16
 
17
 
18
 
19
 
20
W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846--850, 1971.
 
21
 
22

Collaborative Colleagues:
Ron Bekkerman: colleagues
Martin Scholz: colleagues
Krishnamurthy Viswanathan: colleagues