|
ABSTRACT
In kernel methods, an interesting recent development seeks to learn a good kernel from empirical data automatically. In this paper, by regarding the transductive learning of the kernel matrix as a missing data problem, we propose a Bayesian hierarchical model for the problem and devise the Tanner-Wong data augmentation algorithm for making inference on the model. The Tanner-Wong algorithm is closely related to Gibbs sampling, and it also bears a strong resemblance to the expectation-maximization (EM) algorithm. For an efficient implementation, we propose a simplified Bayesian hierarchical model and the corresponding Tanner-Wong algorithm. We express the relationship between the kernel on the input space and the kernel on the output space as a symmetric-definite generalized eigenproblem. Based on this eigenproblem, an efficient approach to choosing the base kernel matrices is presented. The effectiveness of our Bayesian model with the Tanner-Wong algorithm is demonstrated through some classification experiments showing promising results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bousquet, O., & Herrmann, D. J. L. (2003). On the complexity of learning the kernel matrix. Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press.
|
| |
3
|
Crammer, K., Keshet, J., & Singer, Y. (2003). Kernel design using boosting. Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press.
|
| |
4
|
Cristianini, N., Kandola, J., Elisseeff, A., & Shawe-Taylor, J. (2002). On kernel target alignment. Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press.
|
| |
5
|
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1--38.
|
| |
6
|
Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore: The Johns Hopkins University Press. Third edition.
|
| |
7
|
Gupta, A., & Nagar, D. (2000). Matrix variate distributions. Boca Raton, FL: Chapman & Hall/CRC.
|
| |
8
|
|
| |
9
|
Kandola, J., Shawe-Taylor, J., & Cristianini, N. (2002). Optimizing kernel alignment over combinations of kernels (Technical Report 2002--121). NeuroCOLT.
|
| |
10
|
|
| |
11
|
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman & Hall.
|
| |
12
|
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528--550.
|
| |
13
|
|
| |
14
|
Zhang, Z. (2003). Learning metrics via discriminant kernels and multidimensional scaling: Toward expected Euclidean representation. Proceedings of the 20th International Conference on Machine Learning (pp. 872--879). Washington, D.C., USA.
|
| |
15
|
Zhang, Z., Kwok, J. T., Yeung, D. Y., & Xiong, Y. (2003a). Bayesian transductive learning of the kernel matrix using Wishart processes (Technical Report HKUST-CS03-09). Department of Computer Science, Hong Kong University of Science and Technology. Available from ftp://ftp.cs.ust.hk/pub/techreport/03/tr03-09.ps.gz.
|
| |
16
|
Zhang, Z., Yeung, D. Y., & Kwok, J. T. (2003b). Gaussian-Wishart processes: A statistical view of kernels and its application to kernel learning (Technical Report HKUST-CS03-15). Department of Computer Science, Hong Kong University of Science and Technology. Available from ftp://ftp.cs.ust.hk/pub/techreport/03/tr03-15.ps.
|
|