|
ABSTRACT
Data may often contain multiple plausible clusterings. In order to discover a clustering which is useful to the user, constrained clustering techniques have been proposed to guide the search. Typically, these techniques assume background knowledge in the form of explicit information about the desired clustering. In contrast, we consider the setting in which the background knowledge is instead about an undesired clustering. Such knowledge may be obtained from an existing classification or precedent algorithm. The problem is then to find a novel, "orthogonal" clustering in the data. We present a general algorithmic framework which makes use of cluster ensemble methods to solve this problem. One key advantage of this approach is that it takes a base clustering method which is used as a black box, allowing the practitioner to select the most appropriate clustering method for the domain. We present experimental results on synthetic and text data which establish the competitiveness of this framework.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Mikhail Bilenko , Sugato Basu , Raymond J. Mooney, Integrating constraints and metric learning in semi-supervised clustering, Proceedings of the twenty-first international conference on Machine learning, p.11, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015360]
|
| |
2
|
L. Bottou and Y. Bengio. Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems, volume 7, pages 585--592. MIT Press, 1995.
|
| |
3
|
G. Chechik and N. Tishby. Extracting relevant structures with side information. In Advances in Neural Information Processing Systems, volume 15, pages 857--864. MIT Press, 2002.
|
| |
4
|
Mark Craven , Dan DiPasquo , Dayne Freitag , Andrew McCallum , Tom Mitchell , Kamal Nigam , Seán Slattery, Learning to extract symbolic knowledge from the World Wide Web, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.509-516, July 1998, Madison, Wisconsin, United States
|
| |
5
|
I. Davidson and A. Satyanarayana. Speeding up k-means clustering by bootstrap averaging. In Proceedings of the Third IEEE International Conference on Data Mining, Workshop on Clustering Large Data Sets, pages 16--25, 2003.
|
| |
6
|
B. Dom. An information-theoretic external cluster-validity measure. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence, pages 137--145, 2002.
|
| |
7
|
M. Gluck and J. E. Corter. Information, uncertainty, and the utility of categories. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pages 283--287, 1985.
|
| |
8
|
|
| |
9
|
|
| |
10
|
J. Havrda and F. Charvát. Quantification method of classification processes. Concept of structural a-entropy. Kybernetika, 3:30--35, 1967.
|
| |
11
|
|
| |
12
|
M. Meilă. Comparing clusterings by the variation of information. In Proceedings of the 16th Annual Conference on Computational Learning Theory, pages 173--187, 2003.
|
| |
13
|
|
| |
14
|
|
| |
15
|
M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems 16, pages 41--48, 2003.
|
| |
16
|
|
| |
17
|
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368--377, 1999.
|
| |
18
|
|
| |
19
|
|
| |
20
|
A. Topchy, A. K. Jain, and W. Punch. A mixture model for clustering ensembles. In Proceedings of the Fourth SIAM Conference on Data Mining, pages 379--390, 2004.
|
| |
21
|
S. Vaithyanathan and D. Gondek. Clustering with informative priors. Technical report, IBM Almaden Research Center, 2002.
|
| |
22
|
|
| |
23
|
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side information. In Advances in Neural Information Processing Systems 15, pages 505--512, 2002.
|
|