ACM Home Page
Please provide us with feedback. Feedback
A new method for indexing genomes using on-disk suffix trees
Full text PdfPdf (689 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB: indexing and physical query optimization table of contents
Pages 649-658  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Marina Barsky  University of Victoria, Victoria, BC, Canada
Ulrike Stege  University of Victoria, Victoria, BC, Canada
Alex Thomo  University of Victoria, Victoria, BC, Canada
Chris Upton  University of Victoria, Victoria, BC, Canada
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 122,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458170
What is a DOI?

ABSTRACT

We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
R. Clifford, and M.J. Sergot Distributed and paged suffix trees forlarge genetic databases. Proc. of 14th Symposiumon Combinatorial Pattern Matching: 70--82, 2003.
 
5
A. Crauser and P. Ferragina A Theoretical and Experimental Study on the Construction of Suffix Arrays in External Memory. Algoritmica, 32(1): 1--35, 2002.
 
6
A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White and S.L. Salzberg Alignment of whole genomes. Nucl. Acids. Res., 27(11): 2369--2376, 1999.
 
7
A.L. Delcher, A. Phillippy, J. Carlton, and S.L. Salzberg Fast algorithms for large-scale genome alignment and comparison. Nucl. Acids. Res., 30(11): 2478--2483, 2002.
 
8
R. Dementiev, J. Kärkkäinen, J. Mehnert, and P. Sanders Better externalmemory suffix array construction. Proc. of the 7thWorkshop on Algorithm Engineering and Experiments, 2005.
9
 
10
 
11
 
12
 
13
E. Hunt, M.P. Atkinson, and R.W. Irving A database index to large biological sequences. The VLDB Jornal, 7(3): 139--148, 2001.
 
14
J. Kärkkäinen, and P. Sanders Simple linear work suffix array construction. Proc. 13th Int. Conf. on Automata, Languages and Programming, 2003
 
15
D.K. Kim, J.S. Sim, H. Park, and K. Park Linear-time construction of suffix arrays: (Extended abstract). Proc. of CPM Conf., LNCS 2676: 189--199, 2003
 
16
P. Ko and S. Aluru Space efficient linear time construction of suffix arrays. J. of Discrete Algorithms, 3 (2-4): 143--156, 2005
 
17
 
18
S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg Versatile and open software for comparing large genomes. Genome Biology, 5(R12): 2004.
 
19
N.J. Larsson and K. Sadakane Faster suffix sorting. Tech. Rep. LUCS-TR:99-214 of the Dept. of Comp. Sc., Lund University, Sweden, 1999.
 
20
 
21
22
 
23
B. Phoophakdee and M.J. Zaki Trellis+: An Effective Approach for Indexing Massive Sequence. Pacific Symposium on Biocomputing, 2008.
 
24
E. Pennisi DNA Study Forces Rethink of What It Means to Be a Gene. Science, 316 (5831): 1556--7, 2007
25
 
26
T. Ryan Gregory The evolution of the genome. Academic Press, 2005.
 
27
K. Schurmann, J. Stoye Suffix-tree construction and storagewith limited main memory. Tech. Rep. 2003-06 of the Universityof Bielefeld, Germany, 2003.
28
 
29
A. Stark et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature, 450 : 219--232, 2007.
 
30
 
31
 
32
E. Ukkonen On-line construction of suffix trees. Algorithmica, 14(3): 1995.
 
33
A. Woolfe et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biology, 3(1), e7 doi:10.1371/journal.pbio.0030007
 
34
Strings, Compression, and Orchestra: www.larsson.dogma.net/research.html
 
35
USCS Genome Browser: hgdownload.cse.ucsc.edu/downloads.html
 
36
Benjarath Pupacdi's Home Page: www.cs.rpi.edu/~zaki/software/trellis

Collaborative Colleagues:
Marina Barsky: colleagues
Ulrike Stege: colleagues
Alex Thomo: colleagues
Chris Upton: colleagues