ACM Home Page
Please provide us with feedback. Feedback
Join synopses for approximate query answering
Full text PdfPdf (1.54 MB)
Source International Conference on Management of Data archive
Proceedings of the 1999 ACM SIGMOD international conference on Management of data table of contents
Philadelphia, Pennsylvania, United States
Pages: 275 - 286  
Year of Publication: 1999
ISBN:1-58113-084-8
Also published in ...
Authors
Swarup Acharya  Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
Phillip B. Gibbons  Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
Viswanath Poosala  Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
Sridhar Ramaswamy  Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 84,   Citation Count: 75
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/304182.304207
What is a DOI?

ABSTRACT

In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for join-queries using only statistics (in particular, samples) from the base relations. We propose join synopses as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins. We present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known. We also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations. Our extensive set of experiments on the TPC-D benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

AGPR99a
AGPR99b
AMS96
APR99
 
BDF+97
l). Barbarfi, W. DuMouchel, C. Faloutsos, E J. Haas, J. M. HeUerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4):3-45, 1997.
CR94
GGMS96
GM98
 
GM99a
E B. Gibbons and Y. Matins. Selecting estimation proce.- dures and bounds for approximate answering of aggregation queries. Technical report, Bell Laboratories, Murray Hill, New Jersey, 1999.
 
GM99b
E B. Gibbons and Y. Matins. Synopsis data structures for mas.. sive data sets. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, 1999. To appear. Available,. as Bell Labs tech. rep., Sept. 1998, and at http://www.belllabs. corn/- pbgibbons/.
 
GMP97a
E B. Gibbons, Y. Matias, and V. Poosala. Aqua project white paper. Technical report, Bell Laboratories, Murray Hill, New Jersey, December 1997.
 
GMP97b
 
Haa96
P.J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. Technical Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.
 
Haa97
HHW97
HNS94
 
HNSS95
HÖT88
 
Koo80
 
LN95
LNS90
 
OR92
PIHS96
 
Poo97
SAC+79
 
Sch97
D. Schneider. The ins & outs (and everything in between) of data warehousing. Tutorial in the 23rd International Conf. on Very Large Data Bases, August 1997.
 
VL93

CITED BY  75

Collaborative Colleagues:
Swarup Acharya: colleagues
Phillip B. Gibbons: colleagues
Viswanath Poosala: colleagues
Sridhar Ramaswamy: colleagues