ACM Home Page
Please provide us with feedback. Feedback
On searching compressed string collections cache-obliviously
Full text PdfPdf (244 KB)
Source
Symposium on Principles of Database Systems archive
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Vancouver, Canada
SESSION: Searching & clustering table of contents
Pages 181-190  
Year of Publication: 2008
ISBN:978-1-60558-152-1
Authors
Paolo Ferragina  Università di Pisa, Pisa, Italy
Roberto Grossi  Università di Pisa, Pisa, Italy
Ankur Gupta  Butler University, Indianapolis, IN, USA
Rahul Shah  Louisiana State University, Baton Rouge, LA, USA
Jeffrey Scott Vitter  Purdue University, West Lafayette, IN, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 120,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376916.1376943
What is a DOI?

ABSTRACT

Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the information-theoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/O-search time, and is also conscious of the query distribution. Finally, we combine those data structures with cache-oblivious tries [2, 5] and obtain a succinct variant whose space is close to the information-theoretic minimum.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
5
6
7
 
8
9
10
 
11
 
12
A. Golynski, R. Grossi, A. Gupta, R. Raman, and S. S. Rao. On the size of succinct indices. In Proc. ESA, LNCS 4698, 371--382, 2007.
 
13
M. He, J. I. Munro, and S. S. Rao. Succinct ordinal trees based on tree covering. In Proc. ICALP, LNCS 4596, 509--520, 2007.
 
14
 
15
 
16
 
17
P. Ko and S. Aluru. Optimal self-adjusting trees for dynamic string data in secondary storage. In Proc. SPIRE, LNCS 4726, 184--194, 2007.
18
 
19
 
20
J. I. Munro. Succinct data structures. Electr. Notes Theor. Comput. Sci., 91(3), 2004.
21
 
22
 
23
F. Ruskey. Combinatorial Generation, 2007. In preparation.
 
24


Collaborative Colleagues:
Paolo Ferragina: colleagues
Roberto Grossi: colleagues
Ankur Gupta: colleagues
Rahul Shah: colleagues
Jeffrey Scott Vitter: colleagues