|
ABSTRACT
We develop a theoretical framework to characterize the hardness of indexing data sets on block-access memory devices like hard disks. We define an indexing workload by a data set and a set of potential queries. For a workload, we can construct an indexing scheme, which is a collection of fixed-sized subsets of the data. We identify two measures of efficiency for an indexing scheme on a workload: storage redundancy, r (how many times each item in the data set is stored), and access overhead, A (how many times more blocks than necessary does a query retrieve).For many interesting families of workloads, there exists a trade-off between storage redundancy and access overhead. Given a desired access overhead A, there is a minimum redundancy that any indexing scheme must exhibit. We prove a lower-bound theorem for deriving the minimum redundancy. By applying this theorem, we show interesting upper and lower bounds and trade-offs between A and r in the case of multidimensional range queries and set queries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Lars Arge , Vasilis Samoladas , Jeffrey Scott Vitter, On two-dimensional indexability and optimal range search indexing, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.346-357, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.304010]
|
| |
3
|
|
 |
4
|
|
| |
5
|
BAYER, R., AND MCCREIGHT, E. 1972. Organization and maintenance of large ordered indexes. Acta Inf. 1, 173-189.
|
| |
6
|
|
 |
7
|
Amit Chakrabarti , Bernard Chazelle , Benjamin Gum , Alexey Lvov, A lower bound on the complexity of approximate nearest-neighbor searching on the Hamming cube, Proceedings of the thirty-first annual ACM symposium on Theory of computing, p.305-311, May 01-04, 1999, Atlanta, Georgia, United States
[doi> 10.1145/301250.301325]
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
FIAT, A., AND SHAMIR, A. 1989. How to find a battleship. Networks 19, 361-371.
|
| |
13
|
FREDMAN, M. L. 1980. The inherent complexity of dynamic data structures which accomodate range queries. In Proceedings of the IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 191-199.
|
| |
14
|
FREDMAN, M. L. 1981. Lower bounds on the complexity of some optimal data structures. SIAM J. Comput. 10, 1-10.
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
JOHNSON, S. M. 1962. A new upper bound for error-correcting codes. IEEE Trans. Inf. Theory 8, 203- 207.
|
| |
19
|
|
 |
20
|
Paris C. Kanellakis , Sridhar Ramaswamy , Darren E. Vengroff , Jeffrey S. Vitter, Indexing for data models with constraints and classes (extended abstract), Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.233-243, May 25-28, 1993, Washington, D.C., United States
[doi> 10.1145/153850.153884]
|
 |
21
|
K. V. Ravi Kanth , Siva Ravada , Jayant Sharma , Jay Banerjee, Indexing medium-dimensionality data in Oracle, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.521-522, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
22
|
|
 |
23
|
Marcel Kornacker , C. Mohan , Joseph M. Hellerstein, Concurrency and recovery in generalized search trees, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.62-72, May 11-15, 1997, Tucson, Arizona, United States
|
 |
24
|
|
| |
25
|
|
| |
26
|
MATOUSEK, J. 1999. Geometric Discrepancy. Springer-Verlag, New York.
|
| |
27
|
|
 |
28
|
Peter Bro Miltersen , Noam Nisan , Shmuel Safra , Avi Wigderson, On data structures and asymmetric communication complexity, Proceedings of the twenty-seventh annual ACM symposium on Theory of computing, p.103-111, May 29-June 01, 1995, Las Vegas, Nevada, United States
[doi> 10.1145/225058.225093]
|
 |
29
|
|
 |
30
|
Mark H. Nodine , Michael T. Goodrich , Jeffrey Scott Vitter, Blocking for external graph searching, Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.222-232, May 25-28, 1993, Washington, D.C., United States
[doi> 10.1145/153850.153880]
|
 |
31
|
Bernd-Uwe Pagel , Hans-Werner Six , Heinrich Toben , Peter Widmayer, Towards an analysis of range query performance in spatial data structures, Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.214-221, May 25-28, 1993, Washington, D.C., United States
[doi> 10.1145/153850.153878]
|
 |
32
|
|
 |
33
|
|
 |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
TURAN, P. 1941. An extrenal problem in graph theory (in Hungarian). Mat. Fiz. Lapok. 48, 435-452.
|
| |
40
|
|
| |
41
|
VAN LINT, J. H., AND WILSON, R. M. 1992. A Course in Combinatorics. Cambridge University Press, Cambridge, Mass.
|
 |
42
|
|
| |
43
|
|
 |
44
|
|
 |
45
|
|
REVIEW
"Fazli Can : Reviewer"
In this paper, the authors introduce a new framework for the modeling of indexing on block-access external devices such as hard disks. The authors define an indexing workload as a data set and the associated set of potential queries. For each work
more...
|