ACM Home Page
Please provide us with feedback. Feedback
A categorization theorem on suffix arrays with applications to space efficient text indexes
Full text PdfPdf (996 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
Vancouver, British Columbia
SESSION: Session 1A table of contents
Pages: 23 - 32  
Year of Publication: 2005
ISBN:0-89871-585-7
Authors
Meng He  University of Waterloo, Waterloo, ON, Canada
J. Ian Munro  University of Waterloo, Waterloo, ON, Canada
S. Srinivasa Rao  University of Waterloo, Waterloo, ON, Canada
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
: SIAM Activity Group on Discrete Mathematics
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 39,   Citation Count: 3
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

In this paper, we design succinct index structures for a text string T of n binary symbols to support efficient searching of a pattern P of length m. Motivated by the fact that the standard representation of suffix arrays uses n lg n bits which is more than the theoretical minimum, we present a theorem that characterizes a permutation as the suffix array of a binary string. Based on the theorem, we design a succinct representation of suffix arrays of binary strings that uses n + o(n) bits, which is the theoretical minimum plus a lower order term, and answers existential and cardinality queries in O(m) time without storing the raw text. With 2n+o(n) bits, we can list pattern occurrences in O(m + occ lg n) time in the general case, and for long patterns, when m = Ω(lg1+∈ n), we answer such listing queries in O(m + occ) time. We also present another implementation that uses O(n) bits and supports pattern searching in O(m + occ lgλ n) time for any fixed λ such that 0 < λ < 1. More results and trade-offs are reported in the paper.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. Burkhardt and J. Kärkkäinen. Fast light suffix array construction and checking. In Proceedings of the 14th Symposium on Combinatorial Pattern Matching, pages 55--69. Springer-Verlag LNCS 2676, 2003.
 
3
M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, 1994.
 
4
 
5
 
6
 
7
 
8
 
9
 
10
R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. Manuscript, http://www.cs.duke.edu/~jsv/Papers/GrV00.text_indexing_slides.ps.gz.
11
 
12
 
13
G. Jacobson. Space-efficient static trees and graphs. In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science, pages 549--554, 1989.
 
14
D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proceedings of the 14th Symposium on Combinatorial Pattern Matching, pages 186--199. Springer-Verlag LNCS 2676, 2003.
 
15
 
16
P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In Proceedings of the 14th Symposium on Combinatorial Pattern Matching, pages 200--210. Springer-Verlag LNCS 2676, 2003.
 
17
 
18
J. I. Munro. Succinct data structures. In Proceedings of Workshop on Data Structures, pages 3--7, 1999.
 
19
 
20
 
21
P. Weiner. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pages 1--11, 1973.

Collaborative Colleagues:
Meng He: colleagues
J. Ian Munro: colleagues
S. Srinivasa Rao: colleagues