|
ABSTRACT
In [13] we reported the genome sequences of S. paradoxus, S. mikatae and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genome-wide comparative analysis allowed the identification of functionally important sequences, both coding and non-coding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes.We developed statistical methods for the systematic de-novo identification of regulatory motifs. Without making use of co-regulated gene sets, we discovered virtually all previously known DNA regulatory motifs as well as several noteworthy novel motifs. With the additional use of gene ontology information, expression clusters and transcription factor binding profiles, we assigned candidate functions to the novel motifs discovered.Our results demonstrate that entirely automatic genome-wide annotation, gene validation, and discovery of regulatory motifs is possible. Our findings are validated by the extensive experimental knowledge in yeast, confirming their applicability to other genomes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. Basic local alignment search tool. J Mol Biol, 215 (3). 403--410.
|
| |
2
|
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M. and Sherlock, G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25 (1). 25--29.
|
| |
3
|
Blanchette, M., Schwikowski, B. and Tompa, M. Algorithms for phylogenetic footprinting. J Comput Biol, 9 (2). 211--223.
|
| |
4
|
Cliften, P.F., Hillier, L.W., Fulton, L., Graves, T., Miner, T., Gish, W.R., Waterston, R.H. and Johnston, M. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res, 11 (7). 1175--1186.
|
| |
5
|
Dwight, S.S., Harris, M.A., Dolinski, K., Ball, C.A., Binkley, G., Christie, K.R., Fisk, D.G., Issel-Tarver, L., Schroeder, M., Sherlock, G., Sethuraman, A., Weng, S., Botstein, D. and Cherry, J.M. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res, 30 (1). 69--72.
|
| |
6
|
Fitch, W.M. Distinguishing homologous from analogous proteins. Syst Zool, 19 (2). 99--113.
|
| |
7
|
Fitch, W.M. Uses for evolutionary trees. Philos Trans R Soc Lond B Biol Sci, 349 (1327). 93--102.
|
| |
8
|
Gasch, A.P. and Eisen, M.B. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol, 3 (11). RESEARCH0059.
|
| |
9
|
Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C.M., Remor, M., Hofert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.A., Copley, R.R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G. and Superti-Furga, G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415 (6868). 141--147.
|
| |
10
|
Grundy, W.N., Bailey, T.L., Elkan, C.P. and Baker, M.E. Meta-MEME: motif-based hidden Markov models of protein families. Comput Appl Biosci, 13 (4). 397--406.
|
| |
11
|
Hughes, J.D., Estep, P.W., Tavazoie, S. and Church, G.M. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol, 296 (5). 1205--1214.
|
| |
12
|
Jiao, K., Nau, J.J., Cool, M., Gray, W.M., Fassler, J.S. and Malone, R.E. Phylogenetic footprinting reveals multiple regulatory elements involved in control of the meiotic recombination gene, REC102. Yeast, 19 (2). 99--114.
|
| |
13
|
Kamvysselis, M., Patterson, N., Edrizzi, M., Birren, B. and Lander, E.S. submitted.
|
| |
14
|
Keogh, R.S., Seoighe, C. and Wolfe, K.H. Evolution of gene order and chromosome number in Saccharomyces, Kluyveromyces and related fungi. Yeast, 14 (5). 443--457.
|
| |
15
|
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. and Wootton, J.C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262 (5131). 208--214.
|
| |
16
|
Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Hannett, N.M., Harbison, C.T., Thompson, C.M., Simon, I., Zeitlinger, J., Jennings, E.G., Murray, H.L., Gordon, D.B., Ren, B., Wyrick, J.J., Tagne, J.B., Volkert, T.L., Fraenkel, E., Gifford, D.K. and Young, R.A. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 298 (5594). 799--804.
|
| |
17
|
Liu, X., Brutlag, D.L. and Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 127--138.
|
| |
18
|
McCue, L., Thompson, W., Carmack, C., Ryan, M.P., Liu, J.S., Derbyshire, V. and Lawrence, C.E. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Res, 29 (3). 774--782.
|
| |
19
|
Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S. and Weil, B. MIPS: a database for genomes and protein sequences. Nucleic Acids Res, 30 (1). 31--34.
|
| |
20
|
Roth, F.P., Hughes, J.D., Estep, P.W. and Church, G.M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol, 16 (10). 939--945.
|
| |
21
|
Simon, I., Barnett, J., Hannett, N., Harbison, C.T., Rinaldi, N.J., Volkert, T.L., Wyrick, J.J., Zeitlinger, J., Gifford, D.K., Jaakkola, T.S. and Young, R.A. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell, 106 (6). 697--708.
|
| |
22
|
Tatusov, R.L., Koonin, E.V. and Lipman, D.J. A genomic perspective on protein families. Science, 278 (5338). 631--637.
|
| |
23
|
Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D. and Koonin, E.V. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res, 29 (1). 22--28.
|
| |
24
|
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. and Church, G.M. Systematic determination of genetic network architecture. Nat Genet, 22 (3). 281--285.
|
| |
25
|
Thompson, J.D., Higgins, D.G. and Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 22 (22). 4673--4680.
|
| |
26
|
Tompa, M. Identifying functional elements by comparative DNA sequence analysis. Genome Res, 11 (7). 1143--1144.
|
| |
27
|
Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S. and Urbach, S. The TRANSFAC system on gene expression regulation. Nucleic Acids Res, 29 (1). 281--283.
|
| |
28
|
Zhang, M.Q. Promoter analysis of co-regulated genes in the yeast genome. Comput Chem, 23 (3-4). 233--250.
|
| |
29
|
Zhu, J. and Zhang, M.Q. SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 15 (7--8). 607--611.
|
CITED BY 2
|
|
Derek Y. Chiang , Alan M. Moses , Manolis Kamvysselis , Eric S. Lander , Michael B. Eisen, Phylogenetically and spatially conserved word pairs associated with gene expression changes in yeasts, Proceedings of the seventh annual international conference on Research in computational molecular biology, p.84-93, April 10-14, 2003, Berlin, Germany
|
|
|
|
|