ACM Home Page
Please provide us with feedback. Feedback
Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies
Full text PdfPdf (5.18 MB)
Source IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) archive
Volume 5 ,  Issue 3  (July 2008) table of contents
Pages 368-384  
Year of Publication: 2008
ISSN:1545-5963
Authors
George Lee  Rutgers University, Piscataway
Carlos Rodriguez  University of Puerto Rico, Mayagez
Anant Madabhushi  Rutgers University, Piscataway
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 93,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/TCBB.2008.36

ABSTRACT

The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomþeld, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 531, pp. 531-537, 1999.
 
2
Y. Peng, "A Novel Ensemble Machine Learning for Robust Microarray Data Classification," Computers in Biology and Medicine, vol. 36, no. 6, pp. 553-573, 2006.
 
3
C. Shi and L. Chen, "Feature Dimension Reduction for Microarray Data Analysis Using Locally Linear Embedding," Proc. Third Asia Pacific Bioinformatics Conf. (APBC '05), pp. 211-217, 2005.
 
4
S.D. Der, A. Zhou, B.R. Williams, and R.H. Silverman, "Identification of Genes Differentially Regulated by Interferon Alpha, Beta, or Gamma Using Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences of the United States of Am., vol. 95, no. 26, pp. 15623-15628, Dec. 1998.
 
5
 
6
T.M. Huang and V. Kecman, "Gene Extraction for Cancer Diagnosis by Support Vector Machines--An Improvement," Artificial Intelligence in Medicine, vol. 35, nos. 1-2, pp. 185-194, 2005.
 
7
G. Turashvili, J. Bouchal, K. Baumforth, W. Wei, M. Dziechciarkova, J. Ehrmann, J. Klein, E. Fridman, J. Skarda, J. Srovnal, M. Hajduch, P. Murray, and Z. Kolar, "Novel Markers for Differentiation of Lobular and Ductal Invasive Breast Carcinomas by Laser Microdissection and Microarray Analysis," BMC Cancer, vol. 7, no. 55, 2007.
 
8
A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu, J.I. Powell, L. Yang, G.E. Marti, T. Moore, J. Hudson, L. Lu, D.B. Lewis, R. Tibshirani, G. Sherlock, W.C. Chan, T.C. Greiner, D.D. Weisenburger, J.O. Armitage, R. Warnke, R. Levy, W. Wilson, M.R. Grever, J.C. Byrd, D. Botstein, P.O. Brown, and L.M. Staudt, "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.
 
9
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini, "Tissue Classification with Gene Expression Profiles," J. Computational Biology, vol. 7, nos. 3-4, pp. 559-583, 2000.
 
10
M.P. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, and D. Haussler, "Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines," Proc. Nat'l Academy of Sciences USA, vol. 97, no. 1, pp. 262-267, Jan. 2000.
 
11
A.C. Tan and D. Gilbert, "Ensemble Machine Learning on Gene Expression Data for Cancer Classification," Applied Bioinformatics, vol. 2, no. 3 supplement, pp. S75-S83, 2003.
 
12
 
13
L. Li, W. Jiang, X. Li, K.L. Moser, Z. Guo, L. Du, Q. Wang, E.J. Topol, Q. Wang, and S. Rao, "A Robust Hybrid between Genetic Algorithm and Support Vector Machine for Extracting an Optimal Feature Gene Subset," Genomics, vol. 85, pp. 16-23, 1995.
 
14
 
15
H. Liu, J. Li, and L. Wong, "A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns," Genome Informatics, vol. 13, pp. 51-60, 2002.
 
16
U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, "Broad Patterns of Gene Expression Revealed by Clustering of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences, vol. 96, no. 12, pp. 6745-6750, 1999.
 
17
D. Singh, P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, P. Tamayo, A.A. Renshaw, A.V. D'Amico, J.P. Richie, E.S. Lander, M. Loda, P.W. Kantoff, T.R. Golub, and W.R. Sellers, "Gene Expression Correlates of Clinical Prostate Cancer Behavior," Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.
 
18
M. Park, J.W. Lee, J.B. Lee, and S.H. Song, "Several Biplot Methods Applied to Gene Expression Data," J. Statistical Planning and Inference, vol. 138, pp. 500-515, 2007.
 
19
E.F. Petricoin, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, and L.A. Liotta, "Use of Proteomic Patterns in Serum to Identify Ovarian Cancer," The Lancet, vol. 359, no. 9306, pp. 572-577, 2002.
 
20
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, "Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation," Proc. Nat'l Academy of Sciences USA, vol. 96, no. 6, pp. 2907-2912, Mar. 1999.
 
21
S. Yang, J. Shin, K.H. Park, H.-C. Jeung, S.Y. Rha, S.H. Noh, W.I. Yang, and H.C. Chung, "Molecular Basis of the Differences between Normal and Tumor Tissues of Gastric Cancer," Biochimica et Biophysica Acta, 2007.
 
22
L.J. van 't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, R.B.P.S. Linsley, and S.H. Friend, "Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer," Nature, vol. 415, pp. 430-536, 2002.
 
23
S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y.H. Kim, L.C. Goumnerova, P.M. Black, C. Lau, J.C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, and T.R. Golub, "Prediction of Central Nervous System Embryonal Tumour Outcome Based on Gene Expression," Nature, vol. 415, pp. 436- 442, 2002.
 
24
W.A. Freije, F.E. Castro-Vargas, Z. Fang, S. Horvath, T. Cloughesy, L.M. Liau, P.S. Mischel, and S.F. Nelson, "Gene Expression Profiling of Gliomas Strongly Predicts Survival," Cancer Research, vol. 64, no. 18, pp. 6503-6510, 2004.
 
25
M.A. Shipp, K.N. Ross, P. Tamayo, A.P. Weng, J.L. Kutok, R.C.T. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G.S. Pinkus, T.S. Ray, M.A. Kovall, K.W. Last, A. Norton, T.A. Lister, J. Mesirov, D.S. Neuberg, E.S. Lander, J.C. Aster, and T.R. Golub, "Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning," Nature Medicine, vol. 8, pp. 68-74, 2002.
 
26
D.G. Beer, S.L.R. Kardia, C.-C. Huang, T.J. Giordano, A.M. Levin, D.E. Misek, L. Lin, G. Chen, T.G. Gharib, D.G. Thomas, M.L. Lizyness, R. Kuick, S. Hayasaka, J.M.G. Taylor, M.D. Iannettoni, M.B. Orringer, and S. Hanash, "Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma," Nature Medicine , vol. 8, pp. 816-823, 2002.
 
27
D.A. Wigle, I. Jurisica, N. Radulovich, M. Pintilie, J. Rossant, N. Liu, C. Lu, J. Woodgett, I. Seiden, M. Johnston, S. Keshavjee, G. Darling, T. Winton, B.-J. Breitkreutz, P. Jorgenson, M. Tyers, F.A. Shepherd, and M.S. Tsao, "Molecular Profiling of Non-Small Cell Lung Cancer and Correlation with Disease-Free Survival," Cancer Research, vol. 62, pp. 3005-3008, 2002.
 
28
R.E. Bellman, Adaptive Control Processes. Princeton Univ. Press, 1961.
 
29
 
30
Z. Liu, D. Chen, and H. Bensmail, "Gene Expression Data Classification with Kernel Principal Component Analysis," J. Biomedicine and Biotechnology, vol. 2, pp. 155-159, 2005.
 
31
 
32
E.-J. Yeoh, M.E. Ross, S.A. Shurtleff, W.K. Williams, D. Patel, R. Mahfouz, F.G. Behm, S.C. Raimondi, M.V. Relling, A. Patel, C. Cheng, D. Campana, D. Wilkins, X. Zhou, J. Li, H. Liu, C.-H. Pui, W.E. Evans, C. Naeve, L. Wong, and J.R. Downing, "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, no. 2, pp. 133-143, 2002.
 
33
J.J. Dai, L. Lieu, and D. Rocke, "Dimension Reduction for Classification with Gene Expression Microarray Data," Statistical Applications in Genetics and Molecular Biology, vol. 5, no. 1, pp. 1-15, 2006.
 
34
K. Dawson, R.L. Rodriguez, and W. Malyj, "Sample Phenotype Clusters in High-Density Oligonucleotide Microarray Data Sets Are Revealed Using Isomap, a Nonlinear Algorithm," BMC Bioinformatics, vol. 6, p. 195, 2005.
 
35
C. Truntzer, C. Mercier, J. Estève, C. Gautier, and P. Roy, "Importance of Data Structure in Comparing Two Dimension Reduction Methods for Classification of Microarray Gene Expression Data," BMC Bioinformatics, vol. 8, no. 90, 2007.
 
36
A. Andersson, T. Olofsson, D. Lindgren, B. Nilsson, C. Ritz, P. Eden, C. Lassen, J. Rade, M. Fontes, H. Morse, J. Heldrup, M. Behrendtz, F.M.M. Hoglund, B. Johansson, and T. Fioretos, "Molecular Signatures in Childhood Acute Leukemia and Their Correlations to Expression Patterns in Normal Hematopoietic Subpopulations," Proc. Nat'l Academy of Sciences, vol. 102, no. 52, pp. 19069-19074, 2005.
 
37
Y. Zhu, R. Wu, N. Sangha, C. Yoo, K.R. Cho, K.A. Shedden, H. Katabuchi, and D.M. Lubman, "Classifications of Ovarian Cancer Tissues by Proteomic Patterns," Proteomics, vol. 6, pp. 5846-5856, 2006.
 
38
M.A. Mendez, C. Hodar, C. Vulpe, and M. Gonzalez, "Discriminant Analysis to Evaluate Clustering of Gene Expression Data," Federation of European Biochemical Soc., vol. 522, pp. 24-28, 2002.
 
39
H. Hotelling, "Analysis of a Complex of Statistical Variables into Principal Components," J. Educational Psychology, vol. 24, pp. 417- 441, 1933.
 
40
 
41
 
42
J. Tenenbaum, V. de Silva, and J.C. Langford, "A Global Geometric Framework for Nonlinear Dimensionality Reduction," Science, vol. 290, no. 5500, pp. 2319-2322, 2000.
 
43
S. Roweis and L. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323- 2326, 2000.
 
44
 
45
A. Madabhushi, J. Shi, M. Rosen, J.E. Tomaszeweski, and M.D. Feldman, "Graph Embedding to Improve Supervised Classification and Novel Class Detection: Application to Prostate Cancer," Proc. Eighth Int'l Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI '05), pp. 729-737, 2005.
 
46
P. Tiwari, A. Madabhushi, and M. Rosen, "A Hierarchical Unsupervised Spectral Clustering Scheme for Detection of Prostate Cancer from Magnetic Resonance Spectroscopy (MRS)," Proc. 10th Int'l Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI '07), vol. 2, pp. 278-286, 2007.
 
47
S. Doyle, M. Hwang, K. Shah, A. Madabhushi, M. Feldman, and J. Tomaszeweski, "Automated Grading of Prostate Cancer Using Architectural and Textural Image Features," Proc. Fourth IEEE Int'l Symp. Biomedical Imaging (ISBI '07), pp. 1284-1287, 2007.
 
48
S. Doyle, M. Hwang, S. Naik, M. Feldman, J. Tomaszeweski, and A. Madabhushi, "Using Manifold Learning for Content-Based Image Retrieval of Prostate Histopathology," Proc. 10th Int'l Conf. Medical Image Computing and Computer-Assisted Intervention (MICCAI) , 2007.
 
49
S. Weng, C. Zhang, Z. Lin, and X. Zhang, "Mining the Structural Knowledge of High-Dimensional Medical Data Using Isomap," Medical and Biological Eng. and Computing, vol. 43, pp. 410-412, 2005.
 
50
 
51
 
52
A. Madabhushi, J. Shi, M.D. Feldman, M. Rosen, and J. Tomaszewski, "Comparing Ensembles of Learners: Detecting Prostate Cancer from High Resolution MRI," Proc. Second Int'l Workshop Computer Vision Approaches to Medical Image Analysis (CVAMIA '06), pp. 25-36, 2006.
 
53
 
54
G.J. Gordon, R.V. Jensen, L.-L. Hsiao, S.R. Gullans, J.E. Blumenstock, S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, and R. Bueno, "Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma," Cancer Research, vol. 62, pp. 4963-4967, 2002.
 
55
 
56
J.R. Quinlan, "Bagging, Boosting, and C4.5," Proc. 13th Nat'l Conf. Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conf. (AAAI/IAAI '96), vol. 1, pp. 725-730, 1996.
 
57
 
58
 
59
F. Kovacs, C. Legancy, and A. Babos, "Cluster Validity Measurement Techniques," Proc. Sixth Int'l Symp. Hungarian Researchers on Computational Intelligence (CINTI), 2005.

Collaborative Colleagues:
George Lee: colleagues
Carlos Rodriguez: colleagues
Anant Madabhushi: colleagues