|
ABSTRACT
Recent advances in Multiple Kernel Learning (MKL) have positioned it as an attractive tool for tackling many supervised learning tasks. The development of efficient gradient descent based optimization schemes has made it possible to tackle large scale problems. Simultaneously, MKL based algorithms have achieved very good results on challenging real world applications. Yet, despite their successes, MKL approaches are limited in that they focus on learning a linear combination of given base kernels. In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. This can be achieved while retaining all the efficiency of existing large scale optimization algorithms. To highlight the advantages of generalized kernel learning, we tackle feature selection problems on benchmark vision and UCI databases. It is demonstrated that the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to state-of-the-art wrapper and filter methods for feature selection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Argyriou, A., Micchelli, C. A., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. Proceedings of the Workshop on Computational Learning Theory (pp. 338--352).
|
| |
3
|
Bach, F. R. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. Advances in Neural Information Processing Systems (pp. 105--112).
|
 |
4
|
Francis R. Bach , Gert R. G. Lanckriet , Michael I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Proceedings of the twenty-first international conference on Machine learning, p.6, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015424]
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
Crammer, K., Keshet, J., & Singer, Y. (2002). Kernel design using boosting. Advances in Neural Information Processing Systems (pp. 537--544).
|
| |
10
|
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola, J. (2001). On kernel-target alignment. Advances in Neural Information Processing Systems (pp. 367--373).
|
| |
11
|
Danskin, J. M. (1967). The theorey of max-min and its applications to weapons allocation problems.
|
| |
12
|
Fung, G., & Mangasarian, O. L. (2002). A feature selection newton method for support vector machine classification (Technical Report 02-03). Univ. of Wisconsin.
|
| |
13
|
Kloft, M., Brefeld, U., Laskov, P., & Sonnenburg, S. (2008). Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Rakotomamonjy, A., Bach, F., Grandvalet, Y., & Canu, S. (2008). Simplemkl. Journal of Machine Learning Research, 9, 2491--2521.
|
 |
18
|
Le Song , Alex Smola , Arthur Gretton , Karsten M. Borgwardt , Justin Bedo, Supervised feature selection via dependence estimation, Proceedings of the 24th international conference on Machine learning, p.823-830, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273600]
|
| |
19
|
|
| |
20
|
Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. Proceedings of the International Conference on Computer Vision.
|
 |
21
|
|
|