|
ABSTRACT
This paper addresses feature selection techniques for classification of high dimensional data, such as those produced by microarray experiments. Some prior knowledge may be available in this context to bias the selection towards some dimensions (genes) a priori assumed to be more relevant. We propose a feature selection method making use of this partial supervision. It extends previous works on embedded feature selection with linear models including regularization to enforce sparsity. A practical approximation of this technique reduces to standard SVM learning with iterative rescaling of the inputs. The scaling factors depend here on the prior knowledge but the final selection may depart from it. Practical results on several microarray data sets show the benefits of the proposed approach in terms of the stability of the selected gene lists with improved classification performances.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., & Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS, 96, 6745--6750.
|
| |
2
|
Ambroise, C., & McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarra gene-expression data. PNAS, 99, 6562--6566.
|
| |
3
|
|
| |
4
|
Cheng, Y., Cai, Y., Sun, Y., & Li, J. (2008). Semi-supervised feature selection under logistic I-RELIEF framework. 19th International Conference on Pattern Recognition.
|
| |
5
|
Franke, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quaterly, 3, 95--110.
|
| |
6
|
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., & Lander, E. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531--537.
|
| |
7
|
|
| |
8
|
|
| |
9
|
Krishnapuram, B., Carin, L., & Hartemink, A. (2004). Kernel methods in computational biology, chapter 14: Gene Expression Analysis: Joint Feature Selection and Classifier Design, 299--317. Cambridge, MA: MIT Press.
|
| |
10
|
|
| |
11
|
Mukherjee, S. (2003). A practical approach to microarray data analysis, chapter 9: Classifying Microarray Data Using Support Vector Machines, 166--185. Springer.
|
| |
12
|
Roth, V. (2004). The generalized LASSO. IEEE Transactions on Neural Networks, 15, 16--28.
|
| |
13
|
|
| |
14
|
Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last, K. W., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster, J. C., & Golub, T. R. (2002). Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8, 68--74.
|
| |
15
|
Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D'Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P., Golub, T., & Sellers, W. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203--209.
|
 |
16
|
|
| |
17
|
|
| |
18
|
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Advances in Neural Information Processing Systems (pp. 668--674).
|
| |
19
|
Zhao, Z., & Liu, H. (2007). Semi-supervised feature selection via spectral analysis. 7th SIAM International Conference on Data Mining (pp. 641--652).
|
|