|
ABSTRACT
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective. A recent theoretical connection between kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2003). Learning distance functions using equivalence relations. Proc. 20th Intl. Conf. on Machine Learning.
|
 |
3
|
|
| |
4
|
Chan, P., Schlag, M., & Zien, J. (1994). Spectral k-way ratio cut partitioning. IEEE Trans. CAD-Integrated Circuits and Systems, 13, 1088--1096.
|
| |
5
|
|
 |
6
|
|
| |
7
|
Dhillon, I., Guan, Y., & Kulis, B. (2004b). A unified view of kernel k-means, spectral clustering and graph cuts (Technical Report TR-04-25). University of Texas at Austin.
|
| |
8
|
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. Wiley.
|
| |
9
|
Kamvar, S. D., Klein, D., & Manning, C. (2003). Spectral learning. Proc. 17th Intl. Joint Conf. on Artificial Intelligence.
|
| |
10
|
|
| |
11
|
Lee, I., Date, S. V., Adai, A. T., & Marcotte, E. M. (2004). A probabilistic functional network of yeast genes. Science, 306(5701), 1555--1558.
|
| |
12
|
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., & Kanehisa, M. (1999). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 27, 29--34.
|
| |
13
|
|
| |
14
|
Strehl, A., Ghosh, J., & Mooney, R. (2000). Impact of similarity measures on web-page clustering. Workshop on Artificial Intelligence for Web Search (AAAI).
|
| |
15
|
|
| |
16
|
Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems 15.
|
| |
17
|
|
CITED BY 16
|
|
Rong Ge , Martin Ester , Byron J. Gao , Zengjian Hu , Binay Bhattacharya , Boaz Ben-Moshe, Joint cluster analysis of attribute data and relationship data: The connected k-center problem, algorithms and applications, ACM Transactions on Knowledge Discovery from Data (TKDD), v.2 n.2, p.1-35, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
Donghai Guan , Weiwei Yuan , Young-Koo Lee , Andrey Gavrilov , Sungyoung Lee, Improving supervised learning performance by using fuzzy clustering method to select training data, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, v.19 n.4,5, p.321-334, December 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|