|
ABSTRACT
In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Alizadeh and et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503--511, 2000.
|
| |
2
|
U. Alon and et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA, 96:6745--6750, 1999.
|
| |
3
|
|
| |
4
|
M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis: An International Journal, 1(3):131--156, 1997.
|
| |
5
|
|
| |
6
|
E. R. Dougherty. Small sample issue for microarray-based classification. Comparative and Functional Genomics, 2:28--34, 2001.
|
| |
7
|
T. R. Golub and et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
G. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, pages 121--129, 1994.
|
| |
12
|
|
| |
13
|
D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of the 13th International Conference on Machine Learning, pages 284--292, 1996.
|
| |
14
|
|
| |
15
|
|
| |
16
|
F. Model, P. Adorjan, A. Olek, and C. Piepenbrock. Feature selection for DNA methylation based cancer classification. Bioinformatics, 17:157--164, 2001.
|
| |
17
|
|
| |
18
|
|
| |
19
|
M. Schena, D. Shalon, R. W. Davis, and P. O. Brown. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270:467--470, 1995.
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
M. Xiong, Z. Fang, and J. Zhao. Biomarker identification by feature wrappers. Genome Research, 11:1878--1887, 2001.
|
| |
24
|
L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proc. of the 20th International Conference on Machine Learning, pages 856--863, 2003.
|
CITED BY 13
|
|
|
|
|
Wai-Ho Au , Keith C. C. Chan , Andrew K. C. Wong , Yang Wang, Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), v.2 n.2, p.83-101, April 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongxing He , Huidong Jin , Jie Chen , Damien McAullay , Jiuyong Li , Tony Fallon, Analysis of breast feeding data using data mining methods, Proceedings of the fifth Australasian conference on Data mining and analystics, p.47-52, November 29-30, 2006, Sydney, Australia
|
|
|
Fabricio Benevenuto , Tiago Rodrigues , Virgilio Almeida , Jussara Almeida , Chao Zhang , Keith Ross, Identifying video spammers in online social networks, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, April 22-22, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Martin Coath , Susan L. Denham , Leigh M. Smith , Henkjan Honing , Amaury Hazan , Piotr Holonowicz , Hendrik Purwins, Model cortical responses for the detection of perceptual onsets and beat tracking in singing, Connection Science, v.21 n.2-3, p.193-205, June 2009
|
|