ACM Home Page
Please provide us with feedback. Feedback
A new approach to fragment assembly in DNA sequencing
Full text PdfPdf (241 KB)
Source Annual Conference on Research in Computational Molecular Biology archive
Proceedings of the fifth annual international conference on Computational biology table of contents
Montreal, Quebec, Canada
Pages: 256 - 267  
Year of Publication: 2001
ISBN:1-58113-353-7
Authors
Pavel A. Pevzner  Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA
Haixu Tang  Department of Mathematics, University of Southern, California, Los Angeles, CA
Michael S. Waterman  Department of Mathematics, University of Southern, California, Los Angeles, CA
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 56,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/369133.369230
What is a DOI?

ABSTRACT

For the last twenty years fragment assembly in DNA sequencing followed the “overlap - layout - consensus” paradigm that is used in all currently available assembly tools. Although this approach proved to be useful in assembling clones, it faces difficulties in genomic shotgun assembly: the existing algorithms make assembly errors and are often unable to resolve repeats even in prokaryotic genomes. Biologists are well-aware of these errors and are forced to carry additional experiments to verify the assembled contigs.

We abandon the classical “overlap - layout - consensus” approach in favor of a new Eulerian Superpath approach that, for the first time, resolves the problem of repeats in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem. This reduction opens new possibilities for repeat resolution and allows one to generate error-free solutions of the large-scale fragment assembly problems. The major improvement of EULER over other algorithms is that it resolves all repeats except long perfect repeats that are theoretically impossible to resolve without additional experiments.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Bolotin, S. Mauger, K. Malarme, S. D. Ehrlich, and A. Sorokin. Low-redundancy sequencing of the entire lactococcus lactis IL1403 genome. Antonie van Leeuwenhoek, 76:27-76, 1999.
 
2
J. K. Bonfield, K. F. Smith, and R. Staden. A new DNA sequence assembly program. Nucleic Acids Research, 23:4992-4999, 1995.
 
3
R. Drmanac, I. Labat, I. Brukner, and R. Crkvenjakov. Sequencing of megabase plus DNA by hybridization: theory of the method. Genomics, 4:114-128, 1989.
 
4
D. Gordon, C. Abajian, and P. Green. Consed: a graphical tool for sequence finishing. Genome Research, 8(3):195-202, 1998.
 
5
P. Green. Documentation for phrap. http://bozeman.mbt.washington.edu/phrap.docs/phrap, html, 1994.
6
 
7
X. Huang and A. Madan. CAP3: A DNA sequence assembly program. Genome Research, 9:868-877, 1999.
 
8
R. Idury and M. Waterman. A new algorithm for DNA sequence assembly. Journal of Computational Biology, 2:291-306, 1995.
 
9
Y. Lysov, V. Florentev, A. Khorlin, K. Khrapko, V. Shik, and A. Mirzabekov. DNA sequencing by hybridization with oligonucleotides. Doklady Academy Nauk USSR, 303:1508-1511, 1988.
 
10
E. M. Myers. Toward simplifying and accurately formulating fragment assembly. Journal of Computational Biology, 2:275-290, 1995.
 
11
E. W. Myers, G. G. Sutton, A. L. Delcher, I. M. Dew, D. P. Fasulo, M. J. Flanigan, S. A. Kravitz, C. M. Mobarry, K. H. Reinert, K. A. Remington, E. L. Anson, R. A. Bolanos, H. H. Chou, C. M. Jordan, A. L. Halpern, S. Lonardi, E. M. Beasley, R. C. Brandon, L. Chen, P. J. Dunn, Z. Lai, Y. Liang, D. R. Nusskern, M. Zhan, Q. Zhang, X. Zheng, G. M. Rubin, M. D. Adams, and J. C. Venter. A whole-genome assembly of Drosophila. Science, 287:2196-2204, 2000.
 
12
J. Parkhill, M. Achtman, K. D. James, S. D. Bentley, C. Churcher, S. R. Klee, G. Morelli, D. Basham, D. Brown, T. Chillingworth, R. M. Davies, P. Davis, K. Devlin, T. Feltwell, N. Hamlin, S. Holroyd, K. Jagels, S. Leather, S. Moule, K. Mungall, M. A. Quail, M. A. Rajandream, K. M. Rutherford, M. Simmonds, J. Skelton, S. Whitehead, B. G. Spratt, and B. G. Barrell. Complete dna sequence of a serogroup a strain of neisseria meningitidis Z2491. Nature, 404:502-506, 2000.
 
13
J. Parkhill, B. Wren, K. Mungall, J. M. Ketley, C. Churcher, D. Basham, T. Chillingworth, R. M. Davies, T. Feltwell, S. Holroyd, K. Jagels, A. Karlyshev, S. Moule, M. J. Pallen, C. W. Penn, Q. A. Quail, M. A. Rajandream, K. M. Rutherford, A. H. van Vliet, S. Whitehead, and B. G. Barrell. The genome sequence of the food-borne pathogen campylobacter jejuni reveals hypervariable sequences. Nature, 403:665-668, 2000.
 
14
 
15
P. Pevzner. Huple DNA sequencing: computer analysis. Journal of Biomolecular Structure and Dynamics, 7:63-73, 1989.
 
16
P. A. Pevzner. Computational Molecular Biology: An Algorithmic Approach. The MIT Pres, 2000.
 
17
J. Roach, C. Boysen, K. Wang, and L. Hood. Pairwise end sequencing: a unified approach to genomic mapping and sequencing. Genomics, 26:345-353, 1995.
 
18
G. Sutton, O. White, M. Adams, and A. Kerlavage. TIGR assembler: A new tool for assembling large shotgun sequencing projects. Genome Science Technology, 1:9-19, 1995.
 
19
H. Tettelin, D. Radune, S. Kasif, H. Khouri, and S. L. Salzberg. Optimized multiplex PCR: emciently closing a whole-genome shotgun sequencing project. Genomics, 62:500-507, 1999.
 
20
M. Waterman. Introduction to Computational Biology. Chapman Hall, 1995.
 
21
J. Weber and G. Myers. Whole genome shotgun sequencing. Genome Research, 7:401-409, 1997.


Collaborative Colleagues:
Pavel A. Pevzner: colleagues
Haixu Tang: colleagues
Michael S. Waterman: colleagues

Peer to Peer - Readers of this Article have also read: