|
ABSTRACT
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
G.E. Blelloch, K. Dhamdhere, E. Halperin, R. Ravi, R. Schwartz, and S. Sridhar, “Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction,” Proc. 33rd Int'l Colloquium Automata, Languages, and Programming, 2006.
|
| |
3
|
|
| |
4
|
Hans L. Bodlaender , Michael R. Fellows , Michael T. Hallett , H. Todd Wareham , Tandy J. Warnow, The hardness of perfect phylogeny, feasible register assignment and other problems on thin colored graphs, Theoretical Computer Science, v.244 n.1-2, p.167-188, Aug. 6, 2000
[doi> 10.1016/S0304-3975(98)00342-9]
|
| |
5
|
M. Bonet, M. Steel, T. Warnow, and S. Yooseph, “Better Methods for Solving Parsimony and Compatibility,” J. Computational Biology, vol. 5, no. 3, pp. 409-422, 1992.
|
| |
6
|
R.G. Downey and M.R. Fellows, Parameterized Complexity. Springer, 1999.
|
| |
7
|
W.H. Day and D. Sankoff, “Computational Complexity of Inferring Phylogenies by Compatibility,” Systematic Zoology, vol. 35, pp. 224-229, 1986.
|
| |
8
|
P. Damaschke, “Parameterized Enumeration, Transversals, and Imperfect Phylogeny Reconstruction,” Proc. Int'l Workshop Parameterized and Exact Computation, pp. 1-12, 2004.
|
| |
9
|
E. Eskin, E. Halperin, and R.M. Karp, “Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny,” J. Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 1-20, 2003.
|
| |
10
|
J. Felsenstein, PHYLIP (Phylogeny Inference Package) version 3.6, distributed by the author, Dept. of Genome Sciences, Univ. of Washington, Seattle, 2005.
|
| |
11
|
|
| |
12
|
L.R. Foulds and R.L. Graham, “The Steiner Problem in Phylogeny Is NP-Complete,” Advances in Applied Math., vol. 3, pp. 43-49, 1982.
|
| |
13
|
G. Ganapathy, V. Ramachandran, and T. Warnow, “Better Hill-Climbing Searches for Parsimony,” Proc. Third Int'l Workshop Algorithms in Bioinformatics (WABI '03), pp. 254-258, 2003.
|
| |
14
|
D. Gusfield, “Efficient Algorithms for Inferring Evolutionary Trees,” Networks, vol. 21, pp. 19-28, 1991.
|
| |
15
|
|
| |
16
|
D. Gusfield and V. Bansal, “A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters,” Proc. Ninth Ann. Int'l Conf. Research in Computational Molecular Biology, pp. 217-232, 2005.
|
| |
17
|
|
| |
18
|
D.A. Hinds, L.L. Stuve, G.B. Nilsen, E. Halperin, E. Eskin, D.G. Ballinger, K.A. Frazer, and D.R. Cox, “Whole Genome Patterns of Common DNA Variation in Three Human Populations,” Science, vol. 307, no. 5712, pp. 1072-1079, 2005.
|
| |
19
|
The International HapMap Consortium, “The International HapMap Project,” Nature, vol. 426, pp. 789-796, 2003.
|
| |
20
|
|
| |
21
|
M. Merimaa, M. Liivak, E. Heinaru, J. Truu, and A. Heinaru, “Functional Co-Adaption of Phenol Hydroxylase and Catechol 2,3-Dioxygenase Genes in Bacteria Possessing Different Phenol and P-Cresol Degradation Pathways,” Proc. Eighth Symp. Bacterial Genetics and Ecology, 2005.
|
| |
22
|
H.J. Promel and A. Steger, The Steiner Tree Problem: A Tour through Graphs Algorithms and Complexity. Vieweg Verlag, 2002.
|
| |
23
|
S. Sridhar, K. Dhamdhere, G.E. Blelloch, E. Halperin, R. Ravi, and R. Schwartz, “Simple Reconstruction of Binary Near-Perfect Phylogenetic Trees,” Proc. Int'l Workshop Bioinformatics Research and Applications, 2006.
|
| |
24
|
C. Semple and M. Steel, Phylogenetics. Oxford Univ. Press, 2003.
|
| |
25
|
M.A. Steel, “The Complexity of Reconstructing Trees from Qualitative Characters and Subtrees,” J. Classification, vol. 9, pp.91-116, 1992.
|
| |
26
|
S.T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Pham, E. Smigielski, and K. Sirotkin, “dbSNP: The NCBI Database of Genetic Variation,” Nucleic Acids Research, vol. 29, pp. 308-311, 2001.
|
| |
27
|
A.C. Stone, R.C. Griffiths, S.L. Zegura, and M.F. Hammer, “High Levels of Y-Chromosome Nucleotide Diversity in the Genus Pan,” Proc. Nat'l Academy of Sciences USA, vol. 99, no. 1, pp. 43-48, 2002.
|
| |
28
|
T. Wirth, X. Wang, B. Linz, R.P. Novick, J.K. Lum, M. Blaser, G. Morelli, D. Falush, and M. Achtman, “Distinguishing Human Ethnic Groups by Means of Sequences from Helicobacter pylori: Lessons from Ladakh,” Proc. Nat'l Academy of Sciences USA, vol. 101, no. 14, pp. 4746-4751, 2004.
|
|