|
ABSTRACT
Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a non-convex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadratically-constrained quadratic program (QCQP), while the second is based on a semidefinite programming (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive soft-threshold on the SVM hyperplane, while the SDP formulation learns a weighted inner-product (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programming SVM.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw., 1, 23--34.
|
| |
2
|
|
| |
3
|
|
| |
4
|
Borchers, B. (1999). CSDP, a C library for semidefinite programming. Optim. Methods Softw., 11, 613--623.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Bradley, P. S., & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optim. Methods Softw., 13(1), 1--10.
|
| |
8
|
Chan, A. B., Vasconcelos, N., & Lanckriet, G. R. G. (2007). Duals of the QCQP and SDP sparse SVM (Technical Report SVCL-TR-2007-02). University of California, San Diego. http://www.svcl.ucsd.edu.
|
| |
9
|
|
| |
10
|
Grandvalet, Y., & Canu, S. (2003). Adaptive scaling for feature selection in SVMs. Neural Information Processing Systems.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Lemaréchal, C., & Oustry, F. (1999). Semidefinite relaxations and Lagrangian duality with application to combinatorial optimization (Technical Report 3710). INRIA.
|
| |
15
|
MOSEK (2006). MOSEK optimization software (Technical Report). http://www.mosek.com/.
|
| |
16
|
|
| |
17
|
Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases (Technical Report). http://www.ics.uci.edu/~mlearn.
|
| |
18
|
Peleg, D., & Meir, R. (2004). A feature selection algorithm based on the global minimization of a generalization error bound. Neural Information Processing Systems.
|
| |
19
|
|
| |
20
|
Sturm, J. F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw., 11, 625--653.
|
| |
21
|
|
| |
22
|
|
| |
23
|
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Neural Information Processing Systems.
|
| |
24
|
Zhu, J., Rossett, S., Hastie, T., & Tibshirani, R. (2003). 1-norm support vector machines. Neural Information Processing Systems.
|
CITED BY 3
|
|
Zenglin Xu , Rong Jin , Jieping Ye , Michael R. Lyu , Irwin King, Non-monotonic feature selection, Proceedings of the 26th Annual International Conference on Machine Learning, p.1145-1152, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|