|
ABSTRACT
The power and popularity of kernel methods stem in part from their ability to handle diverse forms of structured inputs, including vectors, graphs and strings. Recently, several methods have been proposed for combining kernels from heterogeneous data sources. However, all of these methods produce stationary combinations; i.e., the relative weights of the various kernels do not vary among input examples. This article proposes a method for combining multiple kernels in a nonstationary fashion. The approach uses a large-margin latent-variable generative model within the maximum entropy discrimination (MED) framework. Latent parameter estimation is rendered tractable by variational bounds and an iterative optimization procedure. The classifier we use is a log-ratio of Gaussian mixtures, in which each component is implicitly mapped via a Mercer kernel function. We show that the support vector machine is a special case of this model. In this approach, discriminative parameter estimation is feasible via a fast sequential minimal optimization algorithm. Empirical results are presented on synthetic data, several benchmarks, and on a protein function annotation task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Karsten M. Borgwardt , Cheng Soon Ong , Stefan Schönauer , S. V. N. Vishwanathan , Alex J. Smola , Hans-Peter Kriegel, Protein function prediction via graph kernels, Bioinformatics, v.21 n.1, p.47-56, January 2005
[doi> 10.1093/bioinformatics/bti1007]
|
| |
3
|
Jaakkola, T., Meila, M., & Jebara, T. (1999). Maximum entropy discrimination. Advances in Neural Information Processing Systems.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Lanckriet, G. R. G., Deng, M., Cristianini, N., Jordan, M. I., & Noble, W. S. (2004). Kernel-based data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing (pp. 300--311). World Scientific.
|
| |
8
|
|
| |
9
|
Cheng Soon Ong , Alexander J. Smola , Robert C. Williamson, Learning the Kernel with Hyperkernels, The Journal of Machine Learning Research, 6, p.1043-1071, 9/1/2005
|
 |
10
|
Paul Pavlidis , Jason Weston , Jinsong Cai , William Noble Grundy, Gene functional classification from heterogeneous data, Proceedings of the fifth annual international conference on Computational biology, p.249-255, April 22-25, 2001, Montreal, Quebec, Canada
[doi> 10.1145/369133.369228]
|
| |
11
|
|
| |
12
|
Sonnenburg, S., Rätsch, G., & Schafer, C. (2006). A general and efficient multiple kernel learning algorithm. Advances in Neural Information Processing Systems.
|
| |
13
|
Taskar, B., Guestrin, C., & Koller, D. (2003). Max margin markov networks. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press.
|
| |
14
|
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2001). Feature selection for SVMs. Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press.
|
CITED BY 8
|
|
|
|
|
|
|
|
Jianhui Chen , Zheng Zhao , Jieping Ye , Huan Liu, Nonlinear adaptive distance metric learning for clustering, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|