ACM Home Page
Please provide us with feedback. Feedback
Semi-supervised graph clustering: a kernel approach
Full text PdfPdf (854 KB)
Source ACM International Conference Proceeding Series; Vol. 119 archive
Proceedings of the 22nd international conference on Machine learning table of contents
Bonn, Germany
Pages: 457 - 464  
Year of Publication: 2005
ISBN:1-59593-180-5
Authors
Brian Kulis  University of Texas at Austin, Austin, TX
Sugato Basu  University of Texas at Austin, Austin, TX
Inderjit Dhillon  University of Texas at Austin, Austin, TX
Raymond Mooney  University of Texas at Austin, Austin, TX
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 63,   Citation Count: 15
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1102351.1102409
What is a DOI?

ABSTRACT

Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective. A recent theoretical connection between kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2003). Learning distance functions using equivalence relations. Proc. 20th Intl. Conf. on Machine Learning.
3
 
4
Chan, P., Schlag, M., & Zien, J. (1994). Spectral k-way ratio cut partitioning. IEEE Trans. CAD-Integrated Circuits and Systems, 13, 1088--1096.
 
5
6
 
7
Dhillon, I., Guan, Y., & Kulis, B. (2004b). A unified view of kernel k-means, spectral clustering and graph cuts (Technical Report TR-04-25). University of Texas at Austin.
 
8
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. Wiley.
 
9
Kamvar, S. D., Klein, D., & Manning, C. (2003). Spectral learning. Proc. 17th Intl. Joint Conf. on Artificial Intelligence.
 
10
 
11
Lee, I., Date, S. V., Adai, A. T., & Marcotte, E. M. (2004). A probabilistic functional network of yeast genes. Science, 306(5701), 1555--1558.
 
12
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., & Kanehisa, M. (1999). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 27, 29--34.
 
13
 
14
Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. Workshop on Artificial Intelligence for Web Search (AAAI).
 
15
 
16
Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems 15.
 
17

CITED BY  16
Collaborative Colleagues:
Brian Kulis: colleagues
Sugato Basu: colleagues
Inderjit Dhillon: colleagues
Raymond Mooney: colleagues