|
ABSTRACT
The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many problems involving pattern matching and pattern discovery in large strings, such as those arising in computational biology. Here we address the problem of arranging a suffix array on disk so that querying is fast in practice. We show that the combination of a small trie and a suffix array-like blocked data structure allows queries to be answered as much as three times faster than the best alternative disk-based suffix array arrangement. Construction of our data structure requires only modest processing time on top of that required to build the suffix tree, and requires negligible extra memory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Jour. of Molecular Biology, 215(3):403--410, Oct. 1990.
|
| |
4
|
A. Apostolico. The myriad virtues of subword trees. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series F12, pages 85--96. Springer Verlag, Berlin, Germany, 1985.
|
| |
5
|
|
| |
6
|
D. Benson, D. J. Lipman, and J. Ostell. Genbank. Nucleic Acids Research, 21(13):2963--2965, 1993.
|
| |
7
|
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler. Genbank. Nucleic Acids Research, 35:D21--D25, Jan. 2007.
|
| |
8
|
|
| |
9
|
|
| |
10
|
A. Crauser and P. Ferragina. A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica, 32(1):1--35, Jan. 2002.
|
| |
11
|
R. Dementiev, J. Kärkkäinen, J. Mehnert, and P. Sanders. Better external memory suffix array construction. ACM Jour. of Experimental Algorithmics, 2007. To appear.
|
| |
12
|
Ensembl. Ensembl Genome Browser, 2006. http://www.ensembl.org.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
M. A. Maniscalco and S. J. Puglisi. Faster lightweight suffix array construction. In J. Ryan and Dafik, editors, Proc. Australasian Workshop on Combinatorial Algorithms, pages 16--29, 2006.
|
| |
19
|
G. Manzini. Two space saving tricks for linear time LCP computation. In T. Hagerup and J. Katajainen, editors, Proceedings of 9th Scandinavian Workshop on Algorithm Theory (SWAT '04), volume 3111 of Lecture Notes in Computer Science, pages 372--383. Springer Verlag, Berlin, Germany, 2004.
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
J. S. Sim, D. K. Kim, H. Park, and K. Park. Linear-time search in suffix arrays. In M. Miller and K. Park, editors, Proc. Australasian Workshop on Combinatorial Algorithms, pages 139--146, Seoul, Korea, 2003.
|
| |
26
|
B. Smyth. Computing Patterns in Strings. Addison-Wesley, Reading, Massachusetts, USA, 2003.
|
| |
27
|
|
| |
28
|
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.
|
| |
29
|
P. van Emde-Boas. Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters, 6(3):80--82, 1977.
|
| |
30
|
P. Weiner. Linear pattern matching algorithms. In Proc. Annual Symposium on Foundations of Computer Science, pages 1--11, Silver Spring, MD, USA, 1973. IEEE Computer Society Press.
|
CITED BY
|
|
Marina Barsky , Ulrike Stege , Alex Thomo , Chris Upton, A new method for indexing genomes using on-disk suffix trees, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|