ACM Home Page
Please provide us with feedback. Feedback
Synopses for query optimization: a space-complexity perspective
Full text PdfPdf (232 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Paris, France
SESSION: Query execution and optimization table of contents
Pages: 201 - 209  
Year of Publication: 2004
ISBN:158113858X
Authors
Raghav Kaushik  Microsoft Research
Raghu Ramakrishnan  University of Wisconsin-Madison
Venkatesan T. Chakaravarthy  University of Wisconsin-Madison
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 26,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1055558.1055586
What is a DOI?

ABSTRACT

Database systems use precomputed synopses of data to estimate the cost of alternative plans during query optimization. A number of alternative synopsis structures have been proposed, but histograms are by far the most commonly used. While histograms have proved to be very effective in (cost estimation for) single-table selections, queries with joins have long been seen as a challenge; under a model where histograms are maintained for individual tables, a celebrated result of Ioannidis and Christodoulakis observes that errors propagate exponentially with the number of joins in a query.In this paper, we make two main contributions. First, we study the space complexity of using synopses for query optimization from a novel information-theoretic perspective. In particular, we offer evidence in support of histograms for single-table selections, and illustrate their limitations for join queries. Second, for a broad class of common queries involving joins (specifically, all queries involving only key-foreign key joins) we show that the strategy of storing a small pre-computed sample of the database yields probabilistic guarantees that are almost space-optimal, in the sense that in order to provide the same guarantee as sampling, any strategy requires almost the same amount of space. This is an important property if these samples are to be used as database statistics. This is the first such optimality result, to our knowledge, and suggests that pre-computed samples might be an effective way to circumvent the error propagation problem for queries with key-foreign key joins. We support this result empirically through an experimental study that demonstrates the effectiveness of pre-computed samples, and also shows the increasing difference in the effectiveness of samples versus multi-dimensional histograms as the number of joins in the query grows.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
6
 
7
8
9
 
10
11
 
12
T. Gasser, J. Engel, and B. Seifert. Non-parametric. density estimation. In Ann. Stat., 1985.
 
13
14
15
16
 
17
 
18
Y. E. Ioannidis. The history of histograms. In VLDB, 2003.
19
20
 
21
22
 
23
24
25
 
26
 
27
C. Papadimitriou. Computational Complexity. Addison Wesley, 1994.
28
29
30
31
32
 
33
34
Collaborative Colleagues:
Raghav Kaushik: colleagues
Raghu Ramakrishnan: colleagues
Venkatesan T. Chakaravarthy: colleagues