| Non-monotonic feature selection |
| Full text |
Pdf
(697 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 1145-1152
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
Zenglin Xu
|
The Chinese University of Hong Kong, N.T., Hong Kong
|
|
Rong Jin
|
Michigan State University, East Lansing, MI
|
|
Jieping Ye
|
Arizona State University, Tempe, AZ
|
|
Michael R. Lyu
|
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
|
|
Irwin King
|
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 44, Citation Count: 0
|
|
|
ABSTRACT
We consider the problem of selecting a subset of m most informative features where m is the number of required features. This feature selection problem is essentially a combinatorial optimization problem, and is usually solved by an approximation. Conventional feature selection methods address the computational challenge in two steps: (a) ranking all the features by certain scores that are usually computed independently from the number of specified features m, and (b) selecting the top m ranked features. One major shortcoming of these approaches is that if a feature f is chosen when the number of specified features is m, it will always be chosen when the number of specified features is larger than m. We refer to this property as the "monotonic" property of feature selection. In this work, we argue that it is important to develop efficient algorithms for non-monotonic feature selection. To this end, we develop an algorithm for non-monotonic feature selection that approximates the related combinatorial optimization problem by a Multiple Kernel Learning (MKL) problem. We also present a strategy that derives a discrete solution from the approximate solution of MKL, and show the performance guarantee for the derived discrete solution when compared to the global optimal solution for the related combinatorial optimization problem. An empirical study with a number of benchmark data sets indicates the promising performance of the proposed framework compared with several state-of-the-art approaches for feature selection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Francis R. Bach , Gert R. G. Lanckriet , Michael I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Proceedings of the twenty-first international conference on Machine learning, p.6, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015424]
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
Cortes, C., Mohri, M., & Rostamizadeh, A. (2008). Learning sequence kernels. IEEE Workshop on Machine Learning for Signal Processing (pp. 2--8).
|
| |
6
|
Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola, J. S. (2001). On kernel-target alignment. Proc. Neur. Info. Proc. Sys. (pp. 367--373).
|
| |
7
|
Efron, B., Hastie, T., Johnstone, L., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407--499.
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
Koller, D., & Sahami, M. (1996). Toward optimal feature selection. Proc. Int. Conf. Mach. Learn. (pp. 284--292).
|
| |
12
|
|
| |
13
|
|
 |
14
|
Andrew Y. Ng, Feature selection, L1 vs. L2 regularization, and rotational invariance, Proceedings of the twenty-first international conference on Machine learning, p.78, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015435]
|
| |
15
|
|
 |
16
|
Alain Rakotomamonjy , Francis Bach , Stéphane Canu , Yves Grandvalet, More efficiency in multiple kernel learning, Proceedings of the 24th international conference on Machine learning, p.775-782, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273594]
|
| |
17
|
|
 |
18
|
Le Song , Alex Smola , Arthur Gretton , Karsten M. Borgwardt , Justin Bedo, Supervised feature selection via dependence estimation, Proceedings of the 24th international conference on Machine learning, p.823-830, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273600]
|
| |
19
|
|
| |
20
|
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B, 58, 267--288.
|
| |
21
|
Vapnik, V. N. (1998). Statistical learning theory. John Wiley & Sons.
|
| |
22
|
|
| |
23
|
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Proc. Neur. Info. Proc. Sys. (pp. 668--674).
|
| |
24
|
Xu, Z., Jin, R., King, I., & Lyu, M. R. (2009). An extended level method for efficient multiple kernel learning. Proc. Neur. Info. Proc. Sys. (pp. 1825--1832).
|
|