ACM Home Page
Please provide us with feedback. Feedback
Redundancy based feature selection for microarray data
Full text PdfPdf (163 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
POSTER SESSION: Research track posters table of contents
Pages: 737 - 742  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Lei Yu  Arizona State University, Tempe, AZ
Huan Liu  Arizona State University, Tempe, AZ
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 140,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014149
What is a DOI?

ABSTRACT

In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Alizadeh and et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503--511, 2000.
 
2
U. Alon and et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA, 96:6745--6750, 1999.
 
3
 
4
M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis: An International Journal, 1(3):131--156, 1997.
 
5
 
6
E. R. Dougherty. Small sample issue for microarray-based classification. Comparative and Functional Genomics, 2:28--34, 2001.
 
7
T. R. Golub and et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.
 
8
 
9
10
 
11
G. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, pages 121--129, 1994.
 
12
 
13
D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of the 13th International Conference on Machine Learning, pages 284--292, 1996.
 
14
 
15
 
16
F. Model, P. Adorjan, A. Olek, and C. Piepenbrock. Feature selection for DNA methylation based cancer classification. Bioinformatics, 17:157--164, 2001.
 
17
 
18
 
19
M. Schena, D. Shalon, R. W. Davis, and P. O. Brown. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270:467--470, 1995.
20
 
21
 
22
 
23
M. Xiong, Z. Fang, and J. Zhao. Biomarker identification by feature wrappers. Genome Research, 11:1878--1887, 2001.
 
24
L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proc. of the 20th International Conference on Machine Learning, pages 856--863, 2003.

CITED BY  13