ACM Home Page
Please provide us with feedback. Feedback
Efficient algorithms for inverting evolution
Full text PdfPdf (121 KB)
Source Journal of the ACM (JACM) archive
Volume 46 ,  Issue 4  (July 1999) table of contents
Pages: 437 - 449  
Year of Publication: 1999
ISSN:0004-5411
Authors
Martin Farach  Rutgers Univ., New Brunswick, NJ
Sampath Kannan  Univ. of Pennsylvania, Philadelphia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 30,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/320211.320212
What is a DOI?

ABSTRACT

Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of random mutation and natural selection.A stochastic model of evolution can be used to construct phylogenies, or evolutionary trees, for a set of species. Maximum Likelihood Estimation (MLE) methods seek the evolutionary tree which is most likely to have produced the DNA under consideration. While these methods are intellectually satisfying, they have not been widely accepted because of their computational intractability.In this paper, we address the intractability of MLE methods as follows: We introduce a metric on stochastic process models of evolution. We show that this metric is meaningful by proving that in order for any algorithm to distinguish between two stochastic models that are close according to this metric, it needs to be given many observations. We complement this result with a simple and efficient algorithm for inverting the stochastic process of evolution, that is, for building a tree from observations on two-state characters. (We will use the same techniques in a subsequent paper to solve the problem for multistate characters, and hence for building a tree from DNA sequence data.) The tree we build is provably close, in our metric, to the tree generating the data and gets closer as more observations become available.Though there have been many heuristics suggested for the problem of finding good approximations to the most likely tree, our algorithm is the first one with a guaranteed convergence rate, and further, this rate is within a polynomial of the lower-bound rate we establish. Ours is also the first polynomial-time algorithm that is proven to converge at all to the correct tree.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
CAVENDER, J. 1978. Taxonomy with confidence. Math. Biosci. 40, 271-280.
 
4
 
5
6
 
7
FARRIS, J.S. 1972. Estimating phylogenetic trees from distance matrices. Am. Nat. 106, 645-668.
 
8
FELSENSTEIN, J. 1983. Statistical inference of phylogenies. J. Roy. Stat. Soc. A 146, 246-272.
 
9
FELSENSTEIN, J. 1988. Phylogenies from molecular sequences: Inference and reliability. Ann. Rev. Gen. 22, 521-565.
 
10
KASHYAP, R. L., AND SUBAS, S. 1974. Statistical estimation of parameters in a phylogenetic tree using a dynamic model of the substitutional process. J. Theor. Biol., 47, 74-101.
11
 
12
 
13
NEI, M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New York.
 
14
STEEL, M., HENDY, M. D., AND PENNY, D. 1994. A discrete fourier analysis for evolutionary trees. Proc. Natl. Acad. Sci. 91, 3339-3343.
 
15
 
16
SWOFFORD, D. L., AND OLSEN, G. J. 1990. Phylogeny reconstruction. In Molecular Systematics, D. M. Hillis and C. Moritz, eds. Sinauer Associates Inc., Sunderland, Mass., pp. 411-501.


Collaborative Colleagues:
Martin Farach: colleagues
Sampath Kannan: colleagues