ACM Home Page
Please provide us with feedback. Feedback
Secondary indexing in one dimension: beyond b-trees and bitmap indexes
Full text PdfPdf (437 KB)
Source
Symposium on Principles of Database Systems archive
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Providence, Rhode Island, USA
SESSION: Indexing table of contents
Pages 177-186  
Year of Publication: 2009
ISBN:978-1-60558-553-6
Authors
Rasmus Pagh  IT University of Copenhagen, Copenhagen, Denmark
Srinivasa Rao Satti  Seoul National University, Seoul, South Korea
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559795.1559824
What is a DOI?

ABSTRACT

Let ∑ be a finite, ordered alphabet, and consider a string x1χ2... χn ∈ ∑n. A secondary index for x answers alphabet range queries of the form: Given a range [αlr] ⊆ ∑, return the set Ilr] = {ii ∈ >[αlr]}. Secondary indexes are heavily used in relational databases and scientific data analysis. It is well-known that the obvious solution, storing a dictionary for the set ∪ii} with a position set associated with each character, does not always give optimal query time. In this paper we give the first theoretically optimal data structure for the secondary indexing problem. In the I/O model, the amount of data read when answering a query is within a constant factor of the minimum space needed to represent the set Ilr], assuming that the size of internal memory is (|∑| lg n)δ blocks, for some constant δ > 0. The space usage of the data structure is O(nlg |∑|) bits in the worst case, and we further show how to bound the size of the data structure in terms of the 0th order entropy of x. We show how to support updates achieving various time-space trade-offs.

We also consider an approximate version of the basic secondary indexing problem where a query reports a superset of Ilr] containing each element not in Ilr] with probability at most ∈, where ∈ > 0 is the false positive probability. For this problem the amount of data that needs to be read by the query algorithm is reduced to O(|Ilr]| lg(1/∈)) bits.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
Lars Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.
 
4
 
5
Philip Bille, Anna Pagh, and Rasmus Pagh. Fast evaluation of union-intersection expressions. In Proceedings of the 18th International Symposium on Algorithms And Computation (ISAAC '07), volume 4835 of Lecture Notes in Computer Science, pages 739--750. Springer-Verlag, 2007.
6
 
7
8
9
10
 
11
 
12
Peter Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194--203, March 1975.
13
14
15
16
 
17
 
18

Collaborative Colleagues:
Rasmus Pagh: colleagues
Srinivasa Rao Satti: colleagues