ACM Home Page
Please provide us with feedback. Feedback
Non-redundant clustering with conditional ensembles
Full text PdfPdf (494 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 70 - 77  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
David Gondek  IBM T. J. Watson Research Center, Hawthorne, NY
Thomas Hofmann  Fraunhofer IPSI, Darmstadt, Germany
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 62,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081882
What is a DOI?

ABSTRACT

Data may often contain multiple plausible clusterings. In order to discover a clustering which is useful to the user, constrained clustering techniques have been proposed to guide the search. Typically, these techniques assume background knowledge in the form of explicit information about the desired clustering. In contrast, we consider the setting in which the background knowledge is instead about an undesired clustering. Such knowledge may be obtained from an existing classification or precedent algorithm. The problem is then to find a novel, "orthogonal" clustering in the data. We present a general algorithmic framework which makes use of cluster ensemble methods to solve this problem. One key advantage of this approach is that it takes a base clustering method which is used as a black box, allowing the practitioner to select the most appropriate clustering method for the domain. We present experimental results on synthetic and text data which establish the competitiveness of this framework.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
L. Bottou and Y. Bengio. Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems, volume 7, pages 585--592. MIT Press, 1995.
 
3
G. Chechik and N. Tishby. Extracting relevant structures with side information. In Advances in Neural Information Processing Systems, volume 15, pages 857--864. MIT Press, 2002.
 
4
 
5
I. Davidson and A. Satyanarayana. Speeding up k-means clustering by bootstrap averaging. In Proceedings of the Third IEEE International Conference on Data Mining, Workshop on Clustering Large Data Sets, pages 16--25, 2003.
 
6
B. Dom. An information-theoretic external cluster-validity measure. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pages 137--145, 2002.
 
7
M. Gluck and J. E. Corter. Information, uncertainty, and the utility of categories. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 283--287, 1985.
 
8
 
9
 
10
J. Havrda and F. Charvát. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika, 3:30--35, 1967.
 
11
 
12
M. Meilă. Comparing clusterings by the variation of information. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 173--187, 2003.
 
13
 
14
 
15
M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems 16, pages 41--48, 2003.
 
16
 
17
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368--377, 1999.
 
18
 
19
 
20
A. Topchy, A. K. Jain, and W. Punch. A mixture model for clustering ensembles. In Proceedings of the Fourth SIAM Conference on Data Mining, pages 379--390, 2004.
 
21
S. Vaithyanathan and D. Gondek. Clustering with informative priors. Technical report, IBM Almaden Research Center, 2002.
 
22
 
23
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side information. In Advances in Neural Information Processing Systems 15, pages 505--512, 2002.


Collaborative Colleagues:
David Gondek: colleagues
Thomas Hofmann: colleagues