|
ABSTRACT
Traditional non-parametric statistical learning techniques are often computationally attractive, but lack the same generalization and model selection abilities as state-of-the-art Bayesian algorithms which, however, are usually computationally prohibitive. This paper makes several important contributions that allow Bayesian learning to scale to more complex, real-world learning scenarios. Firstly, we show that backfitting --- a traditional non-parametric, yet highly efficient regression tool --- can be derived in a novel formulation within an expectation maximization (EM) framework and thus can finally be given a probabilistic interpretation. Secondly, we show that the general framework of sparse Bayesian learning and in particular the relevance vector machine (RVM), can be derived as a highly efficient algorithm using a Bayesian version of backfitting at its core. As we demonstrate on several regression and classification benchmarks, Bayesian backfitting offers a compelling alternative to current regression methods, especially when the size and dimensionality of the data challenge computational resources.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Csató, L., & Opper, M. (2001). Sparse representation for Gaussian process models. In (Leen et al., 2001), 444--450.
|
| |
3
|
|
| |
4
|
Ghahramani, Z., & Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analysers. Advances in Neural Information Processing Systems 12 (pp. 509--514). Cambridge, MA: MIT Press.
|
| |
5
|
Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models. No. 43 in Monographs on Statistics and Applied Probability. Chapman & Hall.
|
| |
6
|
|
| |
7
|
Leen, T. K., Diettrich, T. G., & Tresp, V. (Eds.). (2001). Advances in neural information processing systems 13, vol. 13. Cambridge, MA: MIT Press.
|
| |
8
|
|
| |
9
|
Massey, W. F. (1965). Principal component regression in exploratory statistical research. Journal of the American Statistical Association, 60, 234--246.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Tipping, M. E., & Faul, A. C. (2003). Fast marginal likelihood maximization for sparse Bayesian models. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics.
|
| |
15
|
|
| |
16
|
Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. Advances in Neural Information Processing Systems 8 (pp. 514--520). Cambridge, MA: MIT Press.
|
| |
17
|
Williams, C. K. I., & Seeger, M. (2001). Using the Nyströöm method to speed up kernel machines. In (Leen et al., 2001), 682--688.
|
| |
18
|
Wold, H. (1975). Soft modeling by latent variables: The nonlinear iterative partial least squares approach. In J. Gani (Ed.), Perspectives in probability and statistics, papers in honour of M. S. Bartlett, 520--540. London: Academic Press.
|
CITED BY 3
|
|
|
|
|
|
|
|
Jo-Anne Ting , Aaron D'Souza , Kenji Yamamoto , Toshinori Yoshioka , Donna Hoffman , Shinji Kakei , Lauren Sergio , John Kalaska , Mitsuo Kawato , Peter Strick , Stefan Schaal, 2008 Special Issue: Variational Bayesian least squares: An application to brain-machine interface data, Neural Networks, v.21 n.8, p.1112-1131, October, 2008
|
|