|
ABSTRACT
Genome sequencing projects and high-throughput technologies like DNA and Protein arrays have resulted in a very large amount of information-rich data. Microarray experimental data are a valuable, but limited source for inferring gene regulation mechanisms on a genomic scale. Additional information such as promoter sequences of genes/ DNA binding motifs, gene ontologies, and location data, when combined with gene expression analysis can increase the statistical significance of the finding. This paper introduces a machine learning approach to information fusion for combining heterogeneous genomic data. This algorithm uses an unsupervised joint learning mechanism that identifies clusters of genes using the combined data. The correlation between gene expression time-series patterns obtained from different experimental conditions and the presence of several distinct and repeated motifs in their upstream sequences is examined here using publicly available yeast cellcycle data. The results show that the combined learning approach taken here identifies correlated genes effectively. The algorithm provides an automated clustering method, but allows the user to specify apriori the influence of each data type on the final clustering using probabilities.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Babenko, V.N., Kosarev, P.S., Vishnevsky, O.V., Levitsky, V.G., Basin, V.V., and Frolov, A.S. Investigating extended regulatory regions of genomic DNA sequences. Bioinformatics 15, 7--8, 1999, 644--653.
|
| |
2
|
Brazma, A., Jonassen, I., Vilo, J., and Ukkonen, E. Predicting gene regulatory elements in silico on a genomic scale. Genome Research 8, 1998, 1202--1215.
|
| |
3
|
Brazma, A., and Vilo, J. Gene expression data analysis. FEBS 480, 1, 2000, 17--24.
|
| |
4
|
Bussemaker, H. Li, H., Siggia, E. D. Regulatory element using correlation with expression. Nature Genetics., 27, 2001, 167--174.
|
| |
5
|
Chiang, D.Y., Brown, P.O., and Eisen, M.B. Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles. Bioinformatics, 17, 2001, S49--S55.
|
| |
6
|
Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. Cluster analysis and display of genome-wide expression patterns. PNAS 95, 25, 1998, 14863--14868.
|
| |
7
|
Fickett, J.W., and Wasserman, W.W. Discovery and modeling of transcriptional regulatroy regions. Current Opinion in Biotechnology, 11, 2000, 19--24.
|
| |
8
|
Friedman, N., Linial, M., and Nachman, I. Using Bayesian networks to analyze gene expression data. Journal of Computational Biology 7, 2000, 601--20.
|
| |
9
|
|
| |
10
|
Hughes, J.D. Estep, P. W., Tavazoie, S., and Church, G.M. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. Journal of Molecular Biology, 296, 2000, 1205--1214.
|
| |
11
|
Jakt, L.M., Cao, L., Cheah, K., S. E., Smith, D. K., Assessing clusters and motifs from gene expression data. Genome Research 11, 1, 2001, 112--123.
|
| |
12
|
Kasturi, J., Acharya, R., and Ramanathan, M. An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 19, 4, 2003, 449--458.
|
| |
13
|
Kellis, M., Patterson, N., Endirizzi, M., Birren, B., and Lander, E.S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature, 423, 2003, 241--254.
|
| |
14
|
|
| |
15
|
Murali, T.M., and Kasif, S. Extracting conserved gene expression motifs from gene expression data. PSB 8, 2003, 77--88.
|
| |
16
|
Park, P. J., Butte, A. J., and Kohane, I.S. Comparing expression profiles of genes with similar promoter regions. Bioinformatics 18, 12, 2002, 1576--1584.
|
| |
17
|
Roth, F. R., Hughes, J. D., Estep, P. E., and Chruch, G. M. Finding DNA Regulatory Motifs within Unaligned Non-Coding Sequences Clustered by Whole-Genome mRNA Quantitation. Nature Biotechnology 16, 1998, 939--45.
|
 |
18
|
Eran Segal , Yoseph Barash , Itamar Simon , Nir Friedman , Daphne Koller, From promoter sequence to expression: a probabilistic framework, Proceedings of the sixth annual international conference on Computational biology, p.263-272, April 18-21, 2002, Washington, DC, USA
[doi> 10.1145/565196.565231]
|
 |
19
|
|
| |
20
|
Sherlock, G. Analysis of large-scale gene expression data. Curr Opin Immunol 12, 5, 2000, 201--205.
|
| |
21
|
Spellman, P. T., Sherlock, G., Zhang, M.Q., Iyer, V. R., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher, B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol.Biol.Cell. 9, 1998, 3273--3297.
|
|