|
ABSTRACT
In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abundance. This survey paper attempts to provide simple high-level descriptions of these numerous algorithms that highlight both their distinctive features and their commonalities, while avoiding as much as possible the complexities of implementation details. New hybrid algorithms are also described. We provide comparisons of the algorithms' worst-case time complexity and use of additional space, together with results of recent experimental test runs on many of their implementations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Apostolico, A. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words. NATO ASI Series F12. Springer-Verlag, Berlin, Germany, 85--96.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Burkhardt, S. and Kärkkäinen, J. 2003. Fast lightweight suffix array construction and checking. In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 55--69.
|
| |
7
|
Burrows, M. and Wheeler, D. J. 1994. A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation, Palo Alto, CA.
|
| |
8
|
Crauser, A. and Ferragina, P. 2002. A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32, 1--35.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
Hart, M. 1997. Project Gutenberg. http://www.gutenberg.net.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Kärkkäinen, J. and Sanders, P. 2003. Simple linear work suffix array construction. In Proceedings of the 30th International Colloquium Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 2971. Springer-Verlag, Berlin, Germany, 943--955.
|
 |
16
|
|
 |
17
|
Richard M. Karp , Raymond E. Miller , Arnold L. Rosenberg, Rapid identification of repeated patterns in strings, trees and arrays, Proceedings of the fourth annual ACM symposium on Theory of computing, p.125-136, May 01-03, 1972, Denver, Colorado, United States
[doi> 10.1145/800152.804905]
|
| |
18
|
|
| |
19
|
Khmelev, D. V. 2003. Program suffsort version 0.1.6. http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html.
|
| |
20
|
Kim, D. K., Jo, J., and Park, H. 2004. A fast algorithm for constructing suffix arrays for fixed-size alphabets. In Proceedings of the 3rd Workshop on Experimental and Efficient Algorithms (WEA 2004), C. C. Ribeiro and S. L. Martins, Eds. Springer-Verlag, Berlin, Germany, 301--314.
|
| |
21
|
Kim, D. K., Sim, J. S., Park, H., and Park, K. 2003. Linear-time construction of suffix arrays. In Proceedings of the 14th Annual Symposium Combinatorial Pattern Matching, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 186--199.
|
| |
22
|
Kim, D. K., Sim, J. S., Park, H., and Park, K. 2005. Constructing suffix arrays in linear time. J. Discrete Algorithms 3, 126--142.
|
| |
23
|
Ko, P. 2006. Linear time suffix array. http://www.public.iastate.edu/~kopang/progRelease/homepage.html.
|
| |
24
|
Ko, P. and Aluru, S. 2003. Space efficient linear time construction of suffix arrays. In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 200--210.
|
| |
25
|
Ko, P. and Aluru, S. 2005. Space efficient linear time construction of suffix arrays. J. Disc. Algor. 3, 143--156.
|
| |
26
|
|
| |
27
|
Larsson, J. N. and Sadakane, K. 1999. Faster suffix sorting. Tech. Rep. LU-CS-TR:99-214 {LUNFD6/(NFCS-3140)/1-20/(1999)}, Department of Computer Science, Lund University, Sweden.
|
| |
28
|
Lee, S. and Park, K. 2004. Efficient implementations of suffix array construction algorithms. In AWOCA 2004: Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms, S. Hong, Ed. 64--72.
|
| |
29
|
Malyshev, D. 2006. DARK the universal archiver based on BWT-DC scheme. http://darchiver.narod.ru/.
|
| |
30
|
|
| |
31
|
|
| |
32
|
Maniscalco, M. A. 2005. MSufSort. http://www.michael-maniscalco.com/msufsort.htm.
|
| |
33
|
Maniscalco, M. A. and Puglisi, S. J. 2006. Faster lightweight suffix array construction. In Proceedings of 17th Australasian Workshop on Combinatorial Algorithms, J. Ryan and Dafik, Eds. Univ. Ballavat, Ballavat, Victoria, Australia, 16--29.
|
| |
34
|
Maniscalco, M. A. and Puglisi, S. J. 2007. An efficient, versatile approach to suffix sorting. ACM J. Experiment. Algor. To appear.
|
| |
35
|
Manzini, G. 2004. Two space saving tricks for linear time LCP computation. In Proceedings of 9th Scandinavian Workshop on Algorithm Theory (SWAT '04), T. Hagerup and J. Katajainen, Eds. Lecture Notes in Computer Science, vol. 3111. Springer-Verlag, Berlin, Germany, 372--383.
|
| |
36
|
|
| |
37
|
McIlroy, M. D. 1997. ssort.c. http://cm.bell-labs.com/cm/cs/who/doug/source.html.
|
| |
38
|
McIlroy, P. M., Bostic, K., and McIlroy, M. D. 1993. Engineering radix sort. Comput. Syst. 6, 1, 5--27.
|
| |
39
|
Mori, Y. 2006. DivSufSort. http://www.homepage3.nifty.com/wpage/software/libdivsufsort.html.
|
| |
40
|
|
| |
41
|
Na, J. C. 2005. Linear-time construction of compressed suffix arrays using O(nlogn)-bit working space for large alphabets. In Proceedings of the 16th Annual Symposium Combinatorial Pattern Matching, A. Apostolico, M. Crochemore, and K. Park, Eds. Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 57--67.
|
 |
42
|
|
| |
43
|
|
| |
44
|
|
| |
45
|
Schürmann, K. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments (ALENEX05). SIAM, 77--85.
|
| |
46
|
|
| |
47
|
Sim, J. S., Kim, D. K., Park, H., and Park, K. 2003. Linear-time search in suffix arrays. In Proceedings of the 14th Australasian Workshop on Combinatorial Algorithms, M. Miller and K. Park, Eds. (Seoul, Korea), 139--146.
|
 |
48
|
|
| |
49
|
Smyth, B. 2003. Computing Patterns in Strings. Pearson Addison-Wesley, Essex, England.
|
CITED BY 6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marina Barsky , Ulrike Stege , Alex Thomo , Chris Upton, A new method for indexing genomes using on-disk suffix trees, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
REVIEW
"Neil D Burgess : Reviewer"
The existing body of knowledge in the rather esoteric area of suffix arrays is summarized in this paper. While it is quite demanding reading, it does make a useful contribution by publishing the results of a series of measured executions. Su
more...
|