ACM Home Page
Please provide us with feedback. Feedback
Consensus group stable feature selection
Full text MovMov (15:32),  PdfPdf (725 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 567-576  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Steven Loscalzo  Binghamton University, Binghamton, NY, USA
Lei Yu  Binghamton University, Binghamton, NY, USA
Chris Ding  University of Texas at Arlington, Arlington, TX, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 56,   Downloads (12 Months): 162,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557084
What is a DOI?

ABSTRACT

Stability is an important yet under-addressed issue in feature selection from high-dimensional and small sample data. In this paper, we show that stability of feature selection has a strong dependency on sample size. We propose a novel framework for stable feature selection which first identifies consensus feature groups from subsampling of training samples, and then performs feature selection by treating each consensus feature group as a single entity. Experiments on both synthetic and real-world data sets show that an algorithm developed under this framework is effective at alleviating the problem of small sample size and leads to more stable feature selection results and comparable or better generalization performance than state-of-the-art feature selection algorithms. Synthetic data sets and algorithm source code are available at http://www.cs.binghamton.edu/~lyu/KDD09/.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
 
6
X. Z. Fern and C. Brodley. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the twentieth International Conference on Machine Learning, pages 186--193, 2003.
 
7
 
8
T. R. Golub, D. K. Slonim, P. Tamayo, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.
 
9
 
10
 
11
 
12
 
13
 
14
 
15
H. Liu, J. Li, and L. Wong. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics, 13:51--60, 2002.
 
16
 
17
 
18
 
19
M. S. Pepe, R. Etzioni, Z. Feng, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst, 93:1054--1060, 2001.
 
20
E. F. Petricoin, A. M. Ardekani, B. A. Hitt, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359:572--577, 2002.
 
21
22
 
23
 
24
M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman and Hall, 1995.
 
25
26
 
27

Collaborative Colleagues:
Steven Loscalzo: colleagues
Lei Yu: colleagues
Chris Ding: colleagues