| A new method for indexing genomes using on-disk suffix trees |
| Full text |
Pdf
(689 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: DB: indexing and physical query optimization
table of contents
Pages 649-658
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
Marina Barsky
|
University of Victoria, Victoria, BC, Canada
|
|
Ulrike Stege
|
University of Victoria, Victoria, BC, Canada
|
|
Alex Thomo
|
University of Victoria, Victoria, BC, Canada
|
|
Chris Upton
|
University of Victoria, Victoria, BC, Canada
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 122, Citation Count: 0
|
|
|
ABSTRACT
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
R. Clifford, and M.J. Sergot Distributed and paged suffix trees forlarge genetic databases. Proc. of 14th Symposiumon Combinatorial Pattern Matching: 70--82, 2003.
|
| |
5
|
A. Crauser and P. Ferragina A Theoretical and Experimental Study on the Construction of Suffix Arrays in External Memory. Algoritmica, 32(1): 1--35, 2002.
|
| |
6
|
A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White and S.L. Salzberg Alignment of whole genomes. Nucl. Acids. Res., 27(11): 2369--2376, 1999.
|
| |
7
|
A.L. Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg Fast algorithms for large-scale genome alignment and comparison. Nucl. Acids. Res., 30(11): 2478--2483, 2002.
|
| |
8
|
R. Dementiev, J. Kärkkäinen, J. Mehnert, and P. Sanders Better externalmemory suffix array construction. Proc. of the 7thWorkshop on Algorithm Engineering and Experiments, 2005.
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
E. Hunt, M.P. Atkinson, and R.W. Irving A database index to large biological sequences. The VLDB Jornal, 7(3): 139--148, 2001.
|
| |
14
|
J. Kärkkäinen, and P. Sanders Simple linear work suffix array construction. Proc. 13th Int. Conf. on Automata, Languages and Programming, 2003
|
| |
15
|
D.K. Kim, J.S. Sim, H. Park, and K. Park Linear-time construction of suffix arrays: (Extended abstract). Proc. of CPM Conf., LNCS 2676: 189--199, 2003
|
| |
16
|
P. Ko and S. Aluru Space efficient linear time construction of suffix arrays. J. of Discrete Algorithms, 3 (2-4): 143--156, 2005
|
| |
17
|
|
| |
18
|
S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg Versatile and open software for comparing large genomes. Genome Biology, 5(R12): 2004.
|
| |
19
|
N.J. Larsson and K. Sadakane Faster suffix sorting. Tech. Rep. LUCS-TR:99-214 of the Dept. of Comp. Sc., Lund University, Sweden, 1999.
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
B. Phoophakdee and M.J. Zaki Trellis+: An Effective Approach for Indexing Massive Sequence. Pacific Symposium on Biocomputing, 2008.
|
| |
24
|
E. Pennisi DNA Study Forces Rethink of What It Means to Be a Gene. Science, 316 (5831): 1556--7, 2007
|
 |
25
|
|
| |
26
|
T. Ryan Gregory The evolution of the genome. Academic Press, 2005.
|
| |
27
|
K. Schurmann, J. Stoye Suffix-tree construction and storagewith limited main memory. Tech. Rep. 2003-06 of the Universityof Bielefeld, Germany, 2003.
|
 |
28
|
|
| |
29
|
A. Stark et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature, 450 : 219--232, 2007.
|
| |
30
|
|
| |
31
|
|
| |
32
|
E. Ukkonen On-line construction of suffix trees. Algorithmica, 14(3): 1995.
|
| |
33
|
A. Woolfe et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biology, 3(1), e7 doi:10.1371/journal.pbio.0030007
|
| |
34
|
Strings, Compression, and Orchestra: www.larsson.dogma.net/research.html
|
| |
35
|
USCS Genome Browser: hgdownload.cse.ucsc.edu/downloads.html
|
| |
36
|
Benjarath Pupacdi's Home Page: www.cs.rpi.edu/~zaki/software/trellis
|
|