ACM Home Page
Please provide us with feedback. Feedback
Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data
Full text PdfPdf (4.28 MB)
Source IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) archive
Volume 6 ,  Issue 2  (April 2009) table of contents
Pages 353-367  
Year of Publication: 2009
ISSN:1545-5963
Authors
Topon Kumar Paul  Toshiba Corporation, Kanagawa
Hitoshi Iba  The University of Tokyo, Japan
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 97,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/TCBB.2007.70245

ABSTRACT

In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, "Broad Patterns of Gene Expression Revealed by Clustering of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy Science USA, vol. 96, pp. 6745-6750, 1999.
 
2
A. Alizadeh, M. Eisen, R. Davis, C. Ma, I. Lossos, A. Rosenwald, J. Boldrick, H. Sabet, T. Tran, X. Yu, J. Powell, L. Yang, G. Marti, T. Moore, J.J. Hudson, L. Lu, D. Lewis, R. Tibshirani, G. Sherlock, W. Chan, T. Greiner, D. Weisenburger, J. Armitage, R. Warnke, R. Levy, W. Wilson, M. Grever, J. Byrd, D. Botstein, P. Brown, and L. Staudt, "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6781, pp. 503-511, 2000.
 
3
A. Ben-Dor, R. Shamir, and Z. Yakhini, "Clustering Gene Expression Patterns," J. Computational Biology, vol. 6, pp. 281- 297, 1999.
 
4
M.B. Eisen, P.T. Spellman, P. Brown, and D. Botstein, "Cluster Analysis and Display of Genome-Wide Expression Patterns," Proc. Nat'l Academy Sciences USA, vol. 95, pp. 14 863-14 868, 1998.
 
5
A. Bhattacharjee, W. Richards, J. Stauton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Behesti, R. Buneo, M. Gillete, M. Loda, G. Weber, E. Mark, E. Lander, W. Wong, B. Johnson, T. Golub, D. Sugarbaker, and M. Meyerson, "Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses," Proc. Nat'l Academy Science USA, vol. 98, pp. 13 790- 13 795, 2001.
 
6
C. Nutt, D. Mani, R. Betensky, P. Tamayo, J. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M. McLaughlin, T.T. Batchelor, P. Black, A. von Deimling, S. Pomeroy, T. Golub, and D. Louis, "Gene Expression-Based Classification of Malignant Gliomas Correlates Better with Survival than Histological Classification," Cancer Research, vol. 63, no. 7, pp. 1602-1607, 2003.
 
7
D. Singh, P. Febbo, K. Ross, D. Jackson, J. Manola, C. Ladd, P. Tamayo, A. Renshaw, A. D'Amico, J. Richie, E. Lander, M. Loda, P. Kantoff, T. Golub, and W. Sellers, "Gene Expression Correlates of Clinical Prostate Cancer Behavior," Cancer Cell, http:// www.cancercell.org/cgi/content/full/1/2/203, Mar. 2002.
 
8
T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 15, pp. 531-537, 1999.
 
9
C.H.Q. Ding, "Unsupervised Feature Selection via Two-Way Ordering in Gene Expression Analysis," Bioinformatics, vol. 19, no. 10, pp. 1259-1266, 2003.
 
10
P. Park, M. Pagano, and M. Bonnetti, "A Nonparametric Scoring Algorithm for Identifying Informative Genes from Microarray Data," Proc. Pacific Symp. Bioinformatics (PSB '01), vol. 6, pp. 30-41, 2001.
 
11
A. Keller, M. Schummer, L. Hood, and W.L. Ruzzo, "Bayesian Classification of DNA Array Expression Data," Technical Report UW-CSE-2000-08-01, Dept. of Computer Science and Eng., Univ. of Washington, 2000.
 
12
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, "Tissue Classification with Gene Expression Profiles," J. Computational Biology, vol. 7, pp. 559-584, 2000.
 
13
 
14
B. Dasarathy, Nearest Neighbor(NN) Norms: NN Pattern Classification Techniques. IEEE CS Press, 1991.
 
15
G.-Z. Li, J. Yang, C.-Z. Ye, and D.-Y. Geng, "Degree Prediction of Malignancy in Brain Glioma Using Support Vector Machines," Computers in Biology and Medicine, vol. 36, pp. 313-325, 2006.
 
16
L. Shen and E.C. Tan, "A Generalized Output-Coding Scheme with SVM for Multiclass Microarray Classification," Proc. Fourth Asia-Pacific Bioinformatics Conf., pp. 179-186, 2006.
 
17
 
18
 
19
 
20
 
21
S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J. Mesirov, T. Poggio, W. Gerald, M. Loda, E. Lander, and T. Golub, "Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy Sciences USA, vol. 98, no. 26, pp. 15 149-15 154, 2001.
 
22
C.H. Ooi and P. Tan, "Genetic Algorithms Applied to Multi-Class Prediction for the Analysis of Gene Expression Data," Bioinformatics , vol. 19, no. 1, pp. 37-44, 2003.
 
23
 
24
E. Keedwell and A. Narayanan, "Genetic Algorithms for Gene Expression Analysis," Applications of Evolutionary Computation, Proc. First European Workshop Evolutionary Bioinformatics (EvoBIO '03), pp. 76-86, 2003.
 
25
 
26
J.M. Deutsch, "Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction," Bioinformatics, vol. 19, no. 1, pp. 45-52, 2003.
27
 
28
T.K. Paul and H. Iba, "Gene Selection for Classification of Cancers Using Probabilistic Model Building Genetic Algorithm," BioSystems , vol. 82, no. 3, pp. 208-225, 2005.
 
29
T.K. Paul and H. Iba, "Selection of the Most Useful Subset of Genes for Gene Expression-Based Classification," Proc. Congress on Evolutionary Computation (CEC '04), pp. 2076-2083, 2004.
 
30
T.K. Paul and H. Iba, "Identification of Informative Genes for Molecular Classification Using Probabilistic Model Building Genetic Algorithm," Lecture Notes in Computer Science, vol. 3102, pp. 414-425, Springer, 2004.
 
31
K. Deb and A.R. Reddy, "Reliable Classification of Two-Class Cancer Data Using Evolutionary Algorithms," BioSystems, vol. 72, pp. 111-129, 2003.
 
32
 
33
 
34
 
35
J.H. Moore, J.S. Parker, N.J. Olsen, and T.M. Aune, "Symbolic Discriminant Analysis of Microarray Data in Autoimmune Disease," Genetic Epidemiology, vol. 23, no. 1, pp. 57-69, 2002.
 
36
J.-H. Hong and S.-B. Cho, "Lymphoma Cancer Classification Using Genetic Programming with SNR Features," Proc. Seventh European Conf. (EuroGP '04), pp. 78-88, 2004.
 
37
 
38
J.A. Driscoll, B. Worzel, and D. MacLean, "Classification of Gene Expression Data with Genetic Programming," Genetic Programming Theory and Practice. pp. 25-42, Kluwer Academic Publishers, 2003.
 
39
 
40
J.R. Koza and D. Andre, "Automatic Discovery of Protein Motifs Using Genetic Programming," Evolutionary Computation, X. Yao, ed., pp. 171-197, World Scientific, 1999.
 
41
 
42
B. Matthews, "Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme," Biochimica et Biophysica Acta, vol. 405, pp. 442-451, 1975.
 
43
R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection," Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI), 1995.
 
44
 
45
 
46
I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gusterson, M. Esteller, O. Kallioniemi, B. Wilfond, A. Borg, and J. Trent, "Gene-Expression Profiles in Hereditary Breast Cancer," The New England J. Medicine, vol. 344, no. 8, pp. 539-548, 2001.
 
47
S. Dudoit, J. Fridlyand, and T.P. Speed, "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, no. 457, pp. 77-87, citeseer.ist.psu.edu/dudoit00comparison.html, 2002.
 
48
C.-C. Chang and C.-J. Lin, LIBSVM: A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.
 
49
H. Wang, H. Wang, W. Shen, H. Huang, L. Hu, L. Ramdas, Y. Zhou, W. Liao, G. Fuller, and W. Zhang, "Insulin-Like Growth Factor Binding Protein 2 Enhances Glioblastoma Invasion by Activating Invasion-Enhancing Genes," Cancer Research, vol. 63, no. 15, pp. 4315-4321, 2003.
 
50
E. Eckman, M. Watson, L. Marlow, K. Sambamurti, and C.B. Eckman, "Alzheimer's Disease Beta-Amyloid Peptide Is Increased in Mice Deficient in Endothelin-Converting Enzyme," J. Biological Chemistry, vol. 278, no. 4, pp. 2081-2084, 2003.
 
51
D. Kirchhofer, M. Peek, M. Lipari, K. Billeci, B. Fan, and P. Moran, "Hepsin Activates Pro-Hepatocyte Growth Factor and Is Inhibited by Hepatocyte Growth Factor Activator Inhibitor-1b (HAI-1b) and HAI-2," FEBS Letters, vol. 579, no. 9, pp. 1945-1950, 2005.
 
52
N. Au, A. Gown, M. Cheang, D. Huntsman, E. Yorida, W.M. Elliott, J. Flint, J. English, C. Gilks, and H. Grimes, "P63 Expression in Lung Carcinoma: A Tissue Microarray Study of 408 Cases," Applied Immunohistochemistry & Molecular Morphology, vol. 12, no. 3, pp. 240-247, 2004.
 
53
A. Onn, A.M. Correa, M. Gilcrease, T. Isobe, E. Massarelli, C.D. Bucana, M.S. O'Reilly, W.K. Hong, I.J. Fidler, J.B. Putnam, and R.S. Herbst, "Synchronous Overexpression of Epidermal Growth Factor Receptor and HER2-neu Protein Is a Predictor of Poor Outcome in Patients with Stage I Non-Small Cell Lung Cancer," Clinical Cancer Research, vol. 10, pp. 136-143, 2004.

Collaborative Colleagues:
Topon Kumar Paul: colleagues
Hitoshi Iba: colleagues