| A categorization theorem on suffix arrays with applications to space efficient text indexes |
| Full text |
Pdf
(996 KB)
|
| Source
|
Symposium on Discrete Algorithms
archive
Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
table of contents
Vancouver, British Columbia
SESSION: Session 1A
table of contents
Pages: 23 - 32
Year of Publication: 2005
ISBN:0-89871-585-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
Society for Industrial and Applied Mathematics
Philadelphia, PA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 39, Citation Count: 3
|
|
|
ABSTRACT
In this paper, we design succinct index structures for a text string T of n binary symbols to support efficient searching of a pattern P of length m. Motivated by the fact that the standard representation of suffix arrays uses n lg n bits which is more than the theoretical minimum, we present a theorem that characterizes a permutation as the suffix array of a binary string. Based on the theorem, we design a succinct representation of suffix arrays of binary strings that uses n + o(n) bits, which is the theoretical minimum plus a lower order term, and answers existential and cardinality queries in O(m) time without storing the raw text. With 2n+o(n) bits, we can list pattern occurrences in O(m + occ lg n) time in the general case, and for long patterns, when m = Ω(lg1+∈ n), we answer such listing queries in O(m + occ) time. We also present another implementation that uses O(n) bits and supports pattern searching in O(m + occ lgλ n) time for any fixed λ such that 0 < λ < 1. More results and trade-offs are reported in the paper.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
S. Burkhardt and J. Kärkkäinen. Fast light suffix array construction and checking. In Proceedings of the 14th Symposium on Combinatorial Pattern Matching, pages 55--69. Springer-Verlag LNCS 2676, 2003.
|
| |
3
|
M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. Manuscript, http://www.cs.duke.edu/~jsv/Papers/GrV00.text_indexing_slides.ps.gz.
|
 |
11
|
|
| |
12
|
Donna Harman , R. Baeza-Yates , Edward Fox , W. Lee, Inverted files, Information retrieval: data structures and algorithms, Prentice-Hall, Inc., Upper Saddle River, NJ, 1992
|
| |
13
|
G. Jacobson. Space-efficient static trees and graphs. In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science, pages 549--554, 1989.
|
| |
14
|
D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proceedings of the 14th Symposium on Combinatorial Pattern Matching, pages 186--199. Springer-Verlag LNCS 2676, 2003.
|
| |
15
|
|
| |
16
|
P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In Proceedings of the 14th Symposium on Combinatorial Pattern Matching, pages 200--210. Springer-Verlag LNCS 2676, 2003.
|
| |
17
|
|
| |
18
|
J. I. Munro. Succinct data structures. In Proceedings of Workshop on Data Structures, pages 3--7, 1999.
|
| |
19
|
|
| |
20
|
|
| |
21
|
P. Weiner. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pages 1--11, 1973.
|
CITED BY 3
|
|
|
|
|
|
|
|
Jérémy Barbay , Meng He , J. Ian Munro , S. Srinivasa Rao, Succinct indexes for strings, binary relations and multi-labeled trees, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, p.680-689, January 07-09, 2007, New Orleans, Louisiana
|
|