|
ABSTRACT
The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method. Matlab code replicating results reported is available at http://www.dcs.gla.ac.uk/~srogers/kernel_comb.html.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Andrews, D., & Mallows, C. (1974). Scale mixtures of Normal distributions. Journal of the Royal Statistical Society, Series B, 36, 99--102.
|
 |
2
|
Francis R. Bach , Gert R. G. Lanckriet , Michael I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Proceedings of the twenty-first international conference on Machine learning, p.6, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015424]
|
| |
3
|
Bach, F. R., Thibaux, R., & Jordan, M. I. (2005). Computing regularization paths for learning multiple kernels. In L. K. Saul. Y. Weiss and L. Bottou (Eds.), Advances in neural information processing systems 17. Cambridge. MA: MIT Press.
|
| |
4
|
Beal, M. (2003). Variational algorithms for approximate bayesian inference. Doctoral dissertation, University College London.
|
| |
5
|
|
| |
6
|
Bousquet, O., & Herrmann, D. J. L. (2003). On the complexity of learning the kernel matrix. In S. T. S. Becker and K. Obermayer (Eds.), Advances in neural information processing systems 15, 399--406. Cambridge, MA: MIT Press.
|
| |
7
|
Crammer, K., Keshet, J., & Singer, Y. (2003). Kernel design using boosting. In S. T. S. Becker and K. Obermayer (Eds.), Advances in neural information processing systems 15, 537--544. Cambridge, MA: MIT Press.
|
| |
8
|
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola. J. (2002). On kernel-target alignment. Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press.
|
 |
9
|
Glenn Fung , Murat Dundar , Jinbo Bi , Bharat Rao, A fast iterative algorithm for fisher discriminant using heterogeneous kernels, Proceedings of the twenty-first international conference on Machine learning, p.40, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015409]
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Kolenda, T., Hansen, L., Larsen, J., & Winther, O. (2002). Independent component analysis for understanding multimedia content. Proceedings of IEEE Workshop on Neural Networks for Signal Processing XII (pp. 757--766).
|
| |
14
|
John Lafferty , Guy Lebanon, Diffusion Kernels on Statistical Manifolds, The Journal of Machine Learning Research, 6, p.129-163, 9/1/2005
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
Ong, C. S., Smola, A. J., & Williamson, R. C. (2003). Hyperkernels. In S. T. S. Becker and K. Obermayer (Eds.), Advances in neural information processing systems 15, 478--485. Cambridge, MA: MIT Press.
|
| |
20
|
|
| |
21
|
|
| |
22
|
Tsang, I. W., & Kwok, J. T. (2004). Efficient hyperkernel learning using second-order cone programming. Proceedings of the 15th European Conference on Machine Learning (pp. 453--464).
|
 |
23
|
Zhihua Zhang , Dit-Yan Yeung , James T. Kwok, Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm, Proceedings of the twenty-first international conference on Machine learning, p.118, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015368]
|
|