|
ABSTRACT
A novel approach, model-based clustering, is described foridentifying complex interactions between genes or gene-categories based on static gene expression data. The approach deals with categorical data, which consists of a set of gene expressionprofiles belonging to one category, and a set belonging to anothercategory. An evolutionary algorithm (Meta-Optimizing Semantic Evolutionary Search, or MOSES) is used to learn an ensemble of classification models distinguishing the two categories, based on inputs that are features corresponding to gene expression values. Each feature is associated with a model-based vector, which encodes quantitative information regarding the utilization of the feature across the ensembles of models. Two different ways of constructing these vectors are explored. These model-based vectors are then clustered using a variant of hierarchical clustering called Omniclust. The result is a set of model-based clusters, in which features are gathered together if they are often considered together by classification models -- which may be because they're co-expressed, or may be for subtler reasons involving multi-gene interactions. The method is illustrated by applying it to two datasets regarding human gene expression, one drawn from brain cells and pertinent to the neurogenetics of aging, and the other drawn from blood cells and relating to differentiating between types of lymphoma. We find that, compared to traditional expression-based clustering, the new method often yields clusters that have higher mathematical quality (in the sense of homogeneity and separation) and also yield novel and meaningful insights into the underlying biological processes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alenzi F. Q. Apoptosis and diseases: regulation and clinical relevance. Saudi Med J, 26, 11 (Nov 2005), 1679--90.
|
| |
2
|
Bar-Joseph Z., Demaine E.D., Gifford D.K., Srebro N., Hamel A.M., Jaakkola T.S. K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics 19: 1070--1078, 2003.
|
| |
3
|
Ben-Dor A., Shamir R., Yakhini Z. Clustering gene expression patterns. J Comput Biol 6: 281--297, 1999.
|
| |
4
|
Bogenrieder T., Herlyn M. Axis of evil: molecular mechanisms of cancer metastasis. Oncogene, 22, 42 (Sep 2003), 6524--36.
|
| |
5
|
Elizabeth I. Boyle , Shuai Weng , Jeremy Gollub , Heng Jin , David Botstein , J. Michael Cherry , Gavin Sherlock, GO: :TermFinder---open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes, Bioinformatics, v.20 n.18, p.3710-3715, December 2004
[doi> 10.1093/bioinformatics/bth456]
|
| |
6
|
Brown M.P., Grundy W.N., Lin D., Cristianini N., Sugnet C.W., Furey T.S., Ares M., Jr., Haussler D.. Knowledgebased analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A 97: 262--267, 2000.
|
| |
7
|
Cho J.H., Lee D., Park J.H., Lee I.B. Gene selection and classification from microarray data using kernel machine. FEBS Lett 571: 93--98, 2004.
|
| |
8
|
|
| |
9
|
Dudoit S., Fridlyand J., Speed T. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97: 77--87, 2002.
|
| |
10
|
Eisen M.B., Spellman P.T., Brown P.O., Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95: 14863--14868, 1998.
|
| |
11
|
Goertzel B., Pennachin C., de Souza Coelho L., Mudado M. Identifying Complex Biological Interactions based on Categorical Gene Expression Data. In Gary G. Yen and Lipo Wang and Piero Bonissone and Simon M. Lucas editors, Proceedings of the 2006 IEEE Congress on Evolutionary Computation, pages 5583--5590, Vancouver, 2006. details
|
| |
12
|
Golub T.R., Slonim D.K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J.P., Coller H., Loh M.L., Downing J.R., Caligiuri M.A.., Bloomfield C.D., Lander E.S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531--537, 1999.
|
| |
13
|
|
| |
14
|
Lombardi G, Burzyn D, Mundinano J, Berguer P, Bekinschtein P, Costa H, Castillo LF, Goldman A, Meiss R, Piazzon I, Nepomnaschy I. Cathepsin-L influences the expression of extracellular matrix in lymphoid organs and plays a role in the regulation of thymic output and of peripheral T cell number. J Immunol, 174, 11 (Jun 2005), 7022--32.
|
| |
15
|
Looks, M. Competent Program Evolution. PhD thesis, Washington University in St. Louis, 2006.
|
| |
16
|
Lu T., Pan Y., Kao S.Y., Li C., Kohane I., Chan J., Yankner B.A.. Gene regulation and DNA damage in the Aging human brain. Nature 429: 883--891, 2004.
|
| |
17
|
Markovetz F.. A bibliography on learning causal networks of gene interactions. 2004
|
| |
18
|
Markovetz F., Spang R. Reconstructing gene regulation networks from passive observations and active interventions. 7th Ann Intl Conf Res Comput Molec Biol (RECOMB), 2003.
|
| |
19
|
Mattson M.P. Neuronal life-and-death signaling, apoptosis, and neurodegenerative disorders. Antioxid Redox Signal. 8, 11-12 (Nov-Dec 2006), 1997--2006.
|
| |
20
|
|
| |
21
|
Neiman P.E., Ruddell A., Jasoni C., Loring G., Thomas S.J., Brandvold K.A., Lee R., Burnside J., Delrow J. Analysis of gene expression during myc oncogene-induced lymphomagenesis in the bursa of Fabricius. Proc Natl Acad Sci U S A 98: 6378--6383, 2001.
|
 |
22
|
|
| |
23
|
Spellman P.T., Sherlock G., Zhang M.Q., Iyer V.R., Anders K., Eisen M.B., Brown P.O., Botstein D. and Futcher B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9: 3273--3297, 1998.
|
| |
24
|
|
| |
25
|
Sharan R., Elkon R., Shamir R. Cluster analysis and its applications to gene expression data. Ernst Schering workshop on Bioinformatics and Genome Analysis. Springer Verlag, 2001.
|
| |
26
|
Shaw R. J. Glucose metabolism and cancer. Curr Opin Cell Biol, 18, 6 (Dec 2006), 598--608.
|
| |
27
|
Shipp M. A., Ross K. N., Tamayo P., Weng A. P., Kutok J. L.. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine. 2002.
|
| |
28
|
|
| |
29
|
|
| |
30
|
Tamayo P., Slonim D., Mesirov J., Zhu Q., Kitareewan S., Dmitrovsky E., Lander E.S., Golub T.R.. "nterpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A. 96: 2907--2912, 1999.
|
| |
31
|
|
| |
32
|
Vert J.P., Kanehisa M. Extracting active pathways from gene expression data. Bioinformatics 19 Suppl 2: II238--II244, 2003.
|
|