ACM Home Page
Please provide us with feedback. Feedback
The gene evolution model and computing its associated probabilities
Full text PdfPdf (685 KB)
Source
Journal of the ACM (JACM) archive
Volume 56 ,  Issue 2  (April 2009) table of contents
Article No. 7  
Year of Publication: 2009
ISSN:0004-5411
Authors
Lars Arvestad  Royal Institute of Technology and Stockholm Bioinformatics Center, Stockholm, Sweden
Jens Lagergren  Royal Institute of Technology and Stockholm Bioinformatics Center, Stockholm, Sweden
Bengt Sennblad  Stockholm University and Stockholm Bioinformatics Center, Stockholm, Sweden
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 27,   Downloads (12 Months): 310,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1502793.1502796
What is a DOI?

ABSTRACT

Phylogeny is both a fundamental tool in biology and a rich source of fascinating modeling and algorithmic problems. Today's wealth of sequenced genomes makes it increasingly important to understand evolutionary events such as duplications, losses, transpositions, inversions, lateral transfers, and domain shuffling. We focus on the gene duplication event, that constitutes a major force in the creation of genes with new function [Ohno 1970; Lynch and Force 2000] and, thereby also, of biodiversity.

We introduce the probabilistic gene evolution model, which describes how a gene tree evolves within a given species tree with respect to speciation, gene duplication, and gene loss. The actual relation between gene tree and species tree is captured by a reconciliation, a concept which we generalize for more expressiveness. The model is a canonical generalization of the classical linear birth-death process, obtained by replacing the interval where the process takes place by a tree.

For the gene evolution model, we derive efficient algorithms for some associated probability distributions: the probability of a reconciled tree, the probability of a gene tree, the maximum probability reconciliation, the posterior probability of a reconciliation, and sampling reconciliations with respect to the posterior probability. These algorithms provides the basis for several applications, including species tree construction, reconciliation analysis, orthology analysis, biogeography, and host-parasite co-evolution.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Addario-Berry, L., Chor, B., Hallett, M., Lagergren, J., Panconesi, A., and Wareham, T. 2004. Ancestral maximum likelihood of evolutionary trees is hard. J. Bioinform. Comput. Biol. 2, 2, 257--271.
 
2
Arvestad, L., Berglund, A.-C., Lagergren, J., and Sennblad, B. 2003. Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19 Suppl 1, i7--15.
3
 
4
Bader, D. A., Moret, B. M., and Yan, M. 2001. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J. Comput. Biol. 8, 5, 483--491.
 
5
Beck, R. M. D., Bininda-Emonds, O. R. P., Cardillo, M., Liu, F.-G. R., and Purvis, A. 2006. A higher-level MRP supertree of placental mammals. BMC Evol. Biol. 6, 93.
 
6
 
7
Bergeron, A., Mixtacki, J., and Stoye, J. 2004. Reversal distance without hurdles and fortresses. In Lecture Notes in Computer Science, Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM 2004). Springer-Verlag, Berlin, Germany, 388--399.
 
8
 
9
Bininda-Emonds, O. R. P., Gittleman, J. L., and Steel, M. A. 2002. The (super)tree of life: Procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33, 265--289.
 
10
Brown, J. K. M. 1994. Probabilities of evolutionary trees. Syst. Biol. 43, 1, 78--91.
 
11
Bull, J. J., Huelsenbeck, J. P., Cunningham, C. W., Swofford, D. L., and Waddell, P. J. 1993. Partitioning and combining data in phylogenetic analysis. Syst. Biol. 42, 3, 384--397.
 
12
 
13
Cheng, Z., Ventura, M., She, X., Khaitovich, P., Graves, T., Osoegawa, K., Church, D., DeJong, P., Wilson, R., Pbo, S., Rocchi, M., and Eichler, E. E. 2005. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437, 7055, 88--93.
 
14
Chor, B., and Tuller, T. 2005. Maximum likelihood of evolutionary trees is hard. In Lecture Notes in Computer Science, Proceedings of the 9th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2005). Springer-Verlag, Berlin, Germany, 296--310.
 
15
 
16
Csürös, M., and Miklós, I. 2006. A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer. In Lecture Notes in Computer Science. Vol. 3909. Springer-Verlag, Berlin, Germany, 206--220.
 
17
Dayrat, B. 2003. The roots of phylogeny: How did Haeckel build his trees? Syst. Biol. 52, 4, 515--527.
 
18
Degnan, J. H., and Salter, L. A. 2005. Gene tree distributions under the coalescent process. Evolution 59, 1, 24--37.
 
19
Doolittle, W. F., Boucher, Y., Nesb, C. L., Douady, C. J., Andersson, J. O., and Roger, A. J. 2003. How big is the iceberg of which organellar genes in nuclear genomes are but the tip? Philos. Trans. Roy. Soc. London B. Biol. Sci. 358, 1429, 39--57.
 
20
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17, 6, 368--376.
 
21
Felsenstein, J. 2003. Inferring phylogenies. Sinauer Associates.
 
22
Fitch, W. M. 1970. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 2, 99--113.
 
23
Fitzpatrick, D., Logue, M., Stajich, J., and Butler, G. 2006. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol. Biol. 6, 1, 99.
 
24
Gogarten, J. P., and Townsend, J. P. 2005. Horizontal gene transfer, genome innovation and evolution. Nat. Rev. Micro. 3, 9, 679--687.
 
25
Goodman, M., Cselusniak, J., Moore, G. W., Romero-Herrera, A. E., and Matsuda, G. 1979. Fitting the gene lineage into its species lineage: A parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. 28, 132--168.
 
26
Hafner, M. S., Sudman, P. D., Villablanca, F. X., Spradling, T. A., Demastes, J. W., and Nadler, S. A. 1994. Disparate rates of molecular evolution in cospeciating hosts and parasites. Science 265, 5175, 1087--1090.
 
27
Hahn, M. W., De Bie, T., Stajich, J. E., Nguyen, C., and Cristianini, N. 2005. Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res. 15, 8, 1153--1160.
28
 
29
Harding, E. F. 1971. The probabilities of rooted tree-shapes generated by random bifurcation. Adv. Appl. Prob. 3, 1, 44--77.
 
30
Hudson, R. R. 1983. Testing the constant-rate neutral allele model with protein sequence data. Evolution 37, 1, 203--217.
 
31
Huelsenbeck, J. P., Rannala, B., and Yang, Z. 1997. Statistical tests of host-parasite cospeciation. Evolution 51, 2, 410--419.
 
32
Huelsenbeck, J. P., and Ronquist, F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17, 8, 754--755.
 
33
Jeffroy, O., Brinkmann, H., Delsuc, F., and Philippe, H. 2006. Phylogenomics: the beginning of incongruence? Trends Genet. 22, 4, 225--231.
 
34
 
35
Kendall, D. G. 1948. On the generalized “birth-and-death” process. Ann. Math. Stat. 19, 1--15.
 
36
Larget, B., and Simon, D. L. 1999. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16, 6, 750--759.
 
37
Liu, L., and Pearl, D. K. 2007. Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst. Biol. 56, 3, 504--514.
 
38
Lynch, M., and Force, A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics 154, 1, 459--473.
 
39
Mirkin, B. G., Fenner, T. I., Galperin, M. Y., and Koonin, E. V. 2003. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2.
 
40
Nee, S., May, R. M., and Harvey, P. H. 1994. The reconstructed evolutionary process. Philos. Trans. Roy. Soc. London, B. Biol. Sci. 344, 1309, 305--311.
 
41
Nelson, G., and Platnick, N. I. 1981. Systematics and biogeography: Cladistics and vicariance. Columbia University Press, New York.
 
42
Ohno, S. 1970. Evolution by Gene Duplication. Springer-Verlag, Berlin, Germany.
 
43
Page, R. D. M. 1994. Maps between trees and cladistic-analysis of historical associations among genes, organisms, and areas. Syst. Biol. 43, 1, 58--77.
 
44
Page, R. D., and Charleston, M. A. 1997. From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol. Phylogenet. Evol. 7, 2, 231--240.
 
45
Page, R. D., Cotton, J. A. 2002. Vertebrate phylogenomics: reconciled trees and gene duplications. In BIOCOMPUTING 2002: Proceedings of the Pacific Symposium. World Scientific Publishing, 536--547.
 
46
Pamilo, P., and Nei, M. 1988. Relationships between gene trees and species trees. Mol. Biol. Evol. 5, 5, 568--583.
 
47
Przytycka, T., Davis, G., Song, N., and Durand, D. 2006. Graph theoretical insights into evolution of multidomain proteins. J. Comput. Biol. 13, 2, 351--363.
 
48
Rannala, B., and Yang, Z. 2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164, 4, 1645--1656.
 
49
Remm, M., Storm, C. E., and Sonnhammer, E. L. 2001. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 5, 1041--1052.
 
50
Robinson-Rechavi, M., Maina, C. V., Gissendanner, C. R., Laudet, V., and Sluder, A. 2005. Explosive lineage-specific expansion of the orphan nuclear receptor HNF4 in nematodes. J. Mol. Evol. 60, 5, 577--586.
 
51
Rokas, A., Williams, B. L., King, N., and Carroll, S. B. 2003. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425, 6960, 798--804.
 
52
Storm, C. E., and Sonnhammer, E. L. 2002. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 18, 1, 92--99.
 
53
Storm, C. E., and Sonnhammer, E. L. 2003. Comprehensive analysis of orthologous protein domains using the HOPS database. Genome Res. 13, 10, 2353--2362.
 
54
Suzuki, Y., Glazko, G. V., and Nei, M. 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Nat. Acad. Sci. 99, 25, 16138--16143.
 
55
Swenson, U., Backlund, A., McLoughlin, S., and Hill, R. S. 2001. Nothofagus biogeography revisited with special emphasis on the enigmatic distribution of subgenus Brassospora in New Caledonia. Cladistics 17, 1, 28--47.
 
56
Tajima, F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 2, 437--460.
 
57
Tannier, E., and Sagot, M.-F. 2004. Sorting by reversals in subquadratic time. In Lecture Notes in Computer Science, Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM 2004). Springer-Verlag, Berlin, Germany, 1--13.
 
58
Tatusov, R. L., Koonin, E. V., and Lipman, D. J. 1997. A genomic perspective on protein families. Science 278, 5338 (Oct), 631--637.
 
59
Thompson, E. A. 1975. Human Evolutionary Trees. Cambridge University Press, Cambridge.
 
60
Yang, Z., and Rannala, B. 1997. Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method. Mol. Biol. Evol. 14, 7, 717--724.
 
61
Yuan, Y. P., Eulenstein, O., Vingron, M., and Bork, P. 1998. Towards detection of orthologues in sequence databases. Bioinformatics 14, 3, 285--289.
 
62
Zmasek, C. M., and Eddy, S. R. 2002. RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 3, 1, 14.

Collaborative Colleagues:
Lars Arvestad: colleagues
Jens Lagergren: colleagues
Bengt Sennblad: colleagues