|
ABSTRACT
Support Vector Machines and other kernel methods have proven to be very effective for nonlinear inference. Practical issues are how to select the type of kernel including any parameters and how to deal with the computational issues caused by the fact that the kernel matrix grows quadratically with the data. Inspired by ensemble and boosting methods like MART, we propose the Multiple Additive Regression Kernels (MARK) algorithm to address these issues. MARK considers a large (potentially infinite) library of kernel matrices formed by different kernel functions and parameters. Using gradient boosting/column generation, MARK constructs columns of the heterogeneous kernel matrix (the base hypotheses) on the fly and then adds them into the kernel ensemble. Regularization methods such as used in SVM, kernel ridge regression, and MART, are used to prevent overfitting. We investigate how MARK is applied to heterogeneous kernel ridge regression. The resulting algorithm is simple to implement and efficient. Kernel parameter selection is handled within MARK. Sampling and "weak" kernels are used to further enhance the computational efficiency of the resulting additive algorithm. The user can incorporate and potentially extract domain knowledge by restricting the kernel library to interpretable kernels. MARK compares very favorably with SVM and kernel ridge regression on several benchmark datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uei.edu/~mlearn/MLRepository.html.
|
| |
2
|
|
| |
3
|
R. Collobert and S. Bengio. Support vector machines for large-scale regression problems. IDIAP-RR-00-17, 2000.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
J. Friedman. Greedy function approximation. Technical report, Department of Statistics, Stanford University, February 1999.
|
 |
8
|
|
| |
9
|
T. Hastie, R. Tibshairani, and J. Friedman. The elements of statistical Learning. Springer, 2001.
|
| |
10
|
T. Hastie and R. Tibshirani. Generalized additive models. Statistical Science, 1:297--318, 1986.
|
| |
11
|
C. F. Ipsen and C. D. Meyer. The idea behind krylov methods. Amer. Math. Monthly, 105(10):889--99, 1998.
|
| |
12
|
L. Mason, P. Bartlett, J. Baxter, and M. Frean. Functional gradient techniques for combining hypotheses. In B. Schölkopf, A. Smola, P. Bartlett, and D. S. ans, editors, Advances in Large Margin Classifiers. MIT Press, 2000.
|
| |
13
|
|
| |
14
|
M. Momma and K. P. Bennett. A pattern search method for model selection of support vector regression. In Proceedings of the Second SIAM International Conference on Data Mining. SIAM, 2002. to appear.
|
| |
15
|
G. Rätsch. Robust Boosting via Convex Optimization: Theory and Applications. PhD thesis, University of Potsdam, Department of Computer Science, 2002.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
CITED BY 8
|
|
|
|
|
Glenn Fung , Murat Dundar , Jinbo Bi , Bharat Rao, A fast iterative algorithm for fisher discriminant using heterogeneous kernels, Proceedings of the twenty-first international conference on Machine learning, p.40, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
Yasemin Altun , Thomas Hofmann , Alexander J. Smola, Gaussian process classification for segmenting and annotating sequences, Proceedings of the twenty-first international conference on Machine learning, p.4, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|