|
ABSTRACT
For the last twenty years fragment assembly in DNA sequencing followed the “overlap - layout - consensus” paradigm that is used in all currently available assembly tools. Although this approach proved to be useful in assembling clones, it faces difficulties in genomic shotgun assembly: the existing algorithms make assembly errors and are often unable to resolve repeats even in prokaryotic genomes. Biologists are well-aware of these errors and are forced to carry additional experiments to verify the assembled contigs.
We abandon the classical “overlap - layout - consensus” approach in favor of a new Eulerian Superpath approach that, for the first time, resolves the problem of repeats in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem. This reduction opens new possibilities for repeat resolution and allows one to generate error-free solutions of the large-scale fragment assembly problems. The major improvement of EULER over other algorithms is that it resolves all repeats except long perfect repeats that are theoretically impossible to resolve without additional experiments.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Bolotin, S. Mauger, K. Malarme, S. D. Ehrlich, and A. Sorokin. Low-redundancy sequencing of the entire lactococcus lactis IL1403 genome. Antonie van Leeuwenhoek, 76:27-76, 1999.
|
| |
2
|
J. K. Bonfield, K. F. Smith, and R. Staden. A new DNA sequence assembly program. Nucleic Acids Research, 23:4992-4999, 1995.
|
| |
3
|
R. Drmanac, I. Labat, I. Brukner, and R. Crkvenjakov. Sequencing of megabase plus DNA by hybridization: theory of the method. Genomics, 4:114-128, 1989.
|
| |
4
|
D. Gordon, C. Abajian, and P. Green. Consed: a graphical tool for sequence finishing. Genome Research, 8(3):195-202, 1998.
|
| |
5
|
P. Green. Documentation for phrap. http://bozeman.mbt.washington.edu/phrap.docs/phrap, html, 1994.
|
 |
6
|
|
| |
7
|
X. Huang and A. Madan. CAP3: A DNA sequence assembly program. Genome Research, 9:868-877, 1999.
|
| |
8
|
R. Idury and M. Waterman. A new algorithm for DNA sequence assembly. Journal of Computational Biology, 2:291-306, 1995.
|
| |
9
|
Y. Lysov, V. Florentev, A. Khorlin, K. Khrapko, V. Shik, and A. Mirzabekov. DNA sequencing by hybridization with oligonucleotides. Doklady Academy Nauk USSR, 303:1508-1511, 1988.
|
| |
10
|
E. M. Myers. Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology, 2:275-290, 1995.
|
| |
11
|
E. W. Myers, G. G. Sutton, A. L. Delcher, I. M. Dew, D. P. Fasulo, M. J. Flanigan, S. A. Kravitz, C. M. Mobarry, K. H. Reinert, K. A. Remington, E. L. Anson, R. A. Bolanos, H. H. Chou, C. M. Jordan, A. L. Halpern, S. Lonardi, E. M. Beasley, R. C. Brandon, L. Chen, P. J. Dunn, Z. Lai, Y. Liang, D. R. Nusskern, M. Zhan, Q. Zhang, X. Zheng, G. M. Rubin, M. D. Adams, and J. C. Venter. A whole-genome assembly of Drosophila. Science, 287:2196-2204, 2000.
|
| |
12
|
J. Parkhill, M. Achtman, K. D. James, S. D. Bentley, C. Churcher, S. R. Klee, G. Morelli, D. Basham, D. Brown, T. Chillingworth, R. M. Davies, P. Davis, K. Devlin, T. Feltwell, N. Hamlin, S. Holroyd, K. Jagels, S. Leather, S. Moule, K. Mungall, M. A. Quail, M. A. Rajandream, K. M. Rutherford, M. Simmonds, J. Skelton, S. Whitehead, B. G. Spratt, and B. G. Barrell. Complete dna sequence of a serogroup a strain of neisseria meningitidis Z2491. Nature, 404:502-506, 2000.
|
| |
13
|
J. Parkhill, B. Wren, K. Mungall, J. M. Ketley, C. Churcher, D. Basham, T. Chillingworth, R. M. Davies, T. Feltwell, S. Holroyd, K. Jagels, A. Karlyshev, S. Moule, M. J. Pallen, C. W. Penn, Q. A. Quail, M. A. Rajandream, K. M. Rutherford, A. H. van Vliet, S. Whitehead, and B. G. Barrell. The genome sequence of the food-borne pathogen campylobacter jejuni reveals hypervariable sequences. Nature, 403:665-668, 2000.
|
| |
14
|
|
| |
15
|
P. Pevzner. Huple DNA sequencing: computer analysis. Journal of Biomolecular Structure and Dynamics, 7:63-73, 1989.
|
| |
16
|
P. A. Pevzner. Computational Molecular Biology: An Algorithmic Approach. The MIT Pres, 2000.
|
| |
17
|
J. Roach, C. Boysen, K. Wang, and L. Hood. Pairwise end sequencing: a unified approach to genomic mapping and sequencing. Genomics, 26:345-353, 1995.
|
| |
18
|
G. Sutton, O. White, M. Adams, and A. Kerlavage. TIGR assembler: A new tool for assembling large shotgun sequencing projects. Genome Science Technology, 1:9-19, 1995.
|
| |
19
|
H. Tettelin, D. Radune, S. Kasif, H. Khouri, and S. L. Salzberg. Optimized multiplex PCR: emciently closing a whole-genome shotgun sequencing project. Genomics, 62:500-507, 1999.
|
| |
20
|
M. Waterman. Introduction to Computational Biology. Chapman Hall, 1995.
|
| |
21
|
J. Weber and G. Myers. Whole genome shotgun sequencing. Genome Research, 7:401-409, 1997.
|
CITED BY 2
|
|
|
|
|
Jacek Blazewicz , Marcin Bryja , Marek Figlerowicz , Piotr Gawron , Marta Kasprzak , Edward Kirton , Darren Platt , Jakub Przybytek , Aleksandra Swiercz , Lukasz Szajkowski, Brief Communication: Whole genome assembly from 454 sequencing output via modified DNA graph concept, Computational Biology and Chemistry, v.33 n.3, p.224-230, June, 2009
|
|