| Consensus group stable feature selection |
| Full text |
Mov
(15:32),
Pdf
(725 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 567-576
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Steven Loscalzo
|
Binghamton University, Binghamton, NY, USA
|
|
Lei Yu
|
Binghamton University, Binghamton, NY, USA
|
|
Chris Ding
|
University of Texas at Arlington, Arlington, TX, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 56, Downloads (12 Months): 162, Citation Count: 0
|
|
|
ABSTRACT
Stability is an important yet under-addressed issue in feature selection from high-dimensional and small sample data. In this paper, we show that stability of feature selection has a strong dependency on sample size. We propose a novel framework for stable feature selection which first identifies consensus feature groups from subsampling of training samples, and then performs feature selection by treating each consensus feature group as a single entity. Experiments on both synthetic and real-world data sets show that an algorithm developed under this framework is effective at alleviating the problem of small sample size and leads to more stable feature selection results and comparable or better generalization performance than state-of-the-art feature selection algorithms. Synthetic data sets and algorithm source code are available at http://www.cs.binghamton.edu/~lyu/KDD09/.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Annalisa Appice , Michelangelo Ceci , Simon Rawles , Peter Flach, Redundant feature elimination for multi-class problems, Proceedings of the twenty-first international conference on Machine learning, p.5, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015397]
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
X. Z. Fern and C. Brodley. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the twentieth International Conference on Machine Learning, pages 186--193, 2003.
|
| |
7
|
|
| |
8
|
T. R. Golub, D. K. Slonim, P. Tamayo, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.
|
| |
9
|
|
| |
10
|
Kees Jong , Jérémie Mary , Antoine Cornuéjols , Elena Marchiori , Michèle Sebag, Ensemble feature ranking, Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, p.267-278, September 20-24, 2004, Pisa, Italy
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
H. Liu, J. Li, and L. Wong. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics, 13:51--60, 2002.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
M. S. Pepe, R. Etzioni, Z. Feng, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst, 93:1054--1060, 2001.
|
| |
20
|
E. F. Petricoin, A. M. Ardekani, B. A. Hitt, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359:572--577, 2002.
|
| |
21
|
|
 |
22
|
Le Song , Alex Smola , Arthur Gretton , Karsten M. Borgwardt , Justin Bedo, Supervised feature selection via dependence estimation, Proceedings of the 24th international conference on Machine learning, p.823-830, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273600]
|
| |
23
|
|
| |
24
|
M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman and Hall, 1995.
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
|