ACM Home Page
Please provide us with feedback. Feedback
Better external memory suffix array construction
Full text PdfPdf (515 KB)
Source Journal of Experimental Algorithmics (JEA) archive
Volume 12 ,  (June 2008) table of contents
SECTION: SECTION: 3 - Special Issue table of contents
Article No. 3.4  
Year of Publication: 2008
ISSN:1084-6654
Authors
Roman Dementiev  Universität Karlsruhe, Karlsruhe, Germany
Juha Kärkkäinen  University of Helsinki, Finland
Jens Mehnert  Max-Planck-Institut für Informatik, Saarbrücken, Germany
Peter Sanders  Universität Karlsruhe, Karlsruhe, Germany
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 145,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1227161.1402296
What is a DOI?

ABSTRACT

Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications, in particular, in bioinformatics. However, so far, it has appeared prohibitive to build suffix arrays for huge inputs that do not fit into main memory. This paper presents design, analysis, implementation, and experimental evaluation of several new and improved algorithms for suffix array construction. The algorithms are asymptotically optimal in the worst case or on average. Our implementation can construct suffix arrays for inputs of up to 4-GB in hours on a low-cost machine. As a tool of possible independent interest, we present a systematic way to design, analyze, and implement pipelined algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
Burkhardt, S. and Kärkkäinen, J. 2003. Fast lightweight suffix array construction and checking. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. LNCS, vol. 2676. Springer, New York. 55--69.
 
6
Burrows, M. and Wheeler, D. J. 1994. A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto) (May).
 
7
 
8
Crauser, A. and Ferragina, P. 2002. Theoretical and experimental study on the construction of suffix-arrays in external memory. Algorithmica 32, 1, 1--35.
 
9
Crauser, A. and Mehlhorn, K. 1998. LEDA-SM a platform for secondary memory computations. Tech. rep., MPII. draft.
 
10
Dementiev, R. 2005. The Stxxl library. Documentation and download at http://stxxl.sourceforge.net.
11
 
12
Dementiev, R., Kettner, L., and Sanders, P. 2005. Stxxl: Standard Template Library for XXL Data Sets. In Proc. 13th Annual European Symposium on Algorithms. LNCS, vol. 3669. 640--651.
13
 
14
 
15
Gropp, W., Lusk, R., and Thakur, R. 1998. Latest Advances in MPI-2. Tutorial on EuroPVM/MPI'98.
 
16
Haanpää, H. 2004. Minimum sum and difference covers of Abelian groups. Journal of Integer Sequences 7, 2, article 04.1.8.
 
17
Hon, W.-K., Lam, T.-W., Sadakane, K., and Sung, W.-K. 2003a. Constructing compressed suffix-arrays with large alphabets. In Proc. 14th International Symposium on Algorithms and Computation. LNCS, vol. 2906. Springer, New York. 240--249.
 
18
 
19
20
 
21
Kim, D. K., Sim, J. S., Park, H., and Park, K. 2003. Linear-time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. LNCS, vol. 2676. Springer, New York. 186--199.
 
22
Ko, P. and Aluru, S. 2003. Space efficient linear-time construction of suffix-arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. LNCS, vol. 2089. Springer, New York. 200--210.
 
23
Kulla, F. and Sanders, P. 2006. Scalable parallel suffix array construction. In EuroPVM/MPI. to appear.
 
24
 
25
 
26
 
27
Sadakane, K. and Shibuya, T. 2001. Indexing huge genome sequences for solving various problems. Genome Informatics 12, 175--183.
 
28
 
29
Vitter, J. S. and Shriver, E. A. M. 1994. Algorithms for parallel memory, I: Two level memories. Algorithmica 12, 2/3, 110--147.
 
30
Weese, D. 2006. Entwurf und Implementierung eines generischen Substring-Index. M.S. thesis, Freie Universität Berlin.

Collaborative Colleagues:
Roman Dementiev: colleagues
Juha Kärkkäinen: colleagues
Jens Mehnert: colleagues
Peter Sanders: colleagues