|
ABSTRACT
A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. D. Andersen and K. D. Andersen. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In T. T. H. Frenk, K. Roos and S. Zhang, editors, High Performance Optimization pages 197--232. Kluwer Academic Publishers, 2000.
|
 |
2
|
Andreas Argyriou , Raphael Hauser , Charles A. Micchelli , Massimiliano Pontil, A DC-programming algorithm for kernel selection, Proceedings of the 23rd international conference on Machine learning, p.41-48, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143850]
|
 |
3
|
Francis R. Bach , Gert R. G. Lanckriet , Michael I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Proceedings of the twenty-first international conference on Machine learning, p.6, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015424]
|
| |
4
|
|
| |
5
|
C. Blake and C. Merz. UCI repository of machine learning databases, 1998.
|
| |
6
|
|
| |
7
|
I. S. Dhillon, Y. Guan, and B. Kulis. A uni?ed view of kernel k-means, spectral clustering and graph partitioning. Technical report, TR-04-25, UTCS, 2004.
|
| |
8
|
P. Diaconis and D. Freedman. Asymptotics of graphical projection pursuit. Annuals of Statistics 12:793--815, 1984.
|
 |
9
|
|
| |
10
|
C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensonal data. In Proceedings of the SIAM International Conference on Data Mining 2004.
|
| |
11
|
J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association 84(405):165--175, 1989.
|
| |
12
|
|
 |
13
|
Glenn Fung , Murat Dundar , Jinbo Bi , Bharat Rao, A fast iterative algorithm for fisher discriminant using heterogeneous kernels, Proceedings of the twenty-first international conference on Machine learning, p.40, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015409]
|
| |
14
|
G. H. Golub and C. F. Van Loan. Matrix Computations The Johns Hopkins University Press, third edition, 1996.
|
| |
15
|
P. Hall and K. Li. On almost linearity of low dimensional projections from high dimensional data. Annuals of Statistics 21:867--889, 1993.
|
 |
16
|
|
| |
17
|
I. Jolliffe. Principal Component Analysis Springer; 2nd edition, 2002.
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
S. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond MIT Press, 2002.
|
| |
25
|
|
| |
26
|
|
| |
27
|
A. Smola and I. Kondor. Kernels and regularization on graphs. In Proceedings of the Annual Conference on Computational Learning Theory 2003.
|
| |
28
|
|
| |
29
|
J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319--2323, 2000.
|
| |
30
|
H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In Advances in Neural Information Processing Systems 2006.
|
| |
31
|
K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems 2005.
|
| |
32
|
E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 2002.
|
| |
33
|
B. Yan and C. Domeniconi. An adaptive kernel method for semi-supervised clustering. In Proceedings of the Seventeenth European Conference on Machine Learning 2006.
|
| |
34
|
L. Yang and R. Jin. Distance metric learning: A comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University, 2006.
|
| |
35
|
L. Yang, R. Jin, R. Sukthankar, and Y. Liu. An efficient algorithm for local distance metric learning. In Proceedings of the Twenty-first National Conference on Artificial Intelligence 2006.
|
 |
36
|
|
| |
37
|
J. Ye, Z. Zhao, and H. Liu. Adaptive distance metric learning for clustering. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2007.
|
| |
38
|
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems 2001.
|
|