ACM Home Page
Please provide us with feedback. Feedback
Improving suffix array locality for fast pattern matching on disk
Full text PdfPdf (255 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 14: Ordered Data table of contents
Pages 661-672  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Ranjan Sinha  The University of Melbourne, Melbourne, Australia
Simon Puglisi  RMIT University, Melbourne, Australia
Alistair Moffat  The University of Melbourne, Melbourne, Australia
Andrew Turpin  RMIT University, Melbourne, Australia
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 225,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376683
What is a DOI?

ABSTRACT

The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many problems involving pattern matching and pattern discovery in large strings, such as those arising in computational biology. Here we address the problem of arranging a suffix array on disk so that querying is fast in practice. We show that the combination of a small trie and a suffix array-like blocked data structure allows queries to be answered as much as three times faster than the best alternative disk-based suffix array arrangement. Construction of our data structure requires only modest processing time on top of that required to build the suffix tree, and requires negligible extra memory.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Jour. of Molecular Biology, 215(3):403--410, Oct. 1990.
 
4
A. Apostolico. The myriad virtues of subword trees. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series F12, pages 85--96. Springer Verlag, Berlin, Germany, 1985.
 
5
 
6
D. Benson, D. J. Lipman, and J. Ostell. Genbank. Nucleic Acids Research, 21(13):2963--2965, 1993.
 
7
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler. Genbank. Nucleic Acids Research, 35:D21--D25, Jan. 2007.
 
8
 
9
 
10
A. Crauser and P. Ferragina. A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica, 32(1):1--35, Jan. 2002.
 
11
R. Dementiev, J. Kärkkäinen, J. Mehnert, and P. Sanders. Better external memory suffix array construction. ACM Jour. of Experimental Algorithmics, 2007. To appear.
 
12
Ensembl. Ensembl Genome Browser, 2006. http://www.ensembl.org.
 
13
 
14
 
15
 
16
 
17
 
18
M. A. Maniscalco and S. J. Puglisi. Faster lightweight suffix array construction. In J. Ryan and Dafik, editors, Proc. Australasian Workshop on Combinatorial Algorithms, pages 16--29, 2006.
 
19
G. Manzini. Two space saving tricks for linear time LCP computation. In T. Hagerup and J. Katajainen, editors, Proceedings of 9th Scandinavian Workshop on Algorithm Theory (SWAT '04), volume 3111 of Lecture Notes in Computer Science, pages 372--383. Springer Verlag, Berlin, Germany, 2004.
 
20
21
22
23
24
 
25
J. S. Sim, D. K. Kim, H. Park, and K. Park. Linear-time search in suffix arrays. In M. Miller and K. Park, editors, Proc. Australasian Workshop on Combinatorial Algorithms, pages 139--146, Seoul, Korea, 2003.
 
26
B. Smyth. Computing Patterns in Strings. Addison-Wesley, Reading, Massachusetts, USA, 2003.
 
27
 
28
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.
 
29
P. van Emde-Boas. Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters, 6(3):80--82, 1977.
 
30
P. Weiner. Linear pattern matching algorithms. In Proc. Annual Symposium on Foundations of Computer Science, pages 1--11, Silver Spring, MD, USA, 1973. IEEE Computer Society Press.


Collaborative Colleagues:
Ranjan Sinha: colleagues
Simon Puglisi: colleagues
Alistair Moffat: colleagues
Andrew Turpin: colleagues