ACM Home Page
Please provide us with feedback. Feedback
Ripple joins for online aggregation
Full text PdfPdf (1.78 MB)
Source International Conference on Management of Data archive
Proceedings of the 1999 ACM SIGMOD international conference on Management of data table of contents
Philadelphia, Pennsylvania, United States
Pages: 287 - 298  
Year of Publication: 1999
ISBN:1-58113-084-8
Also published in ...
Authors
Peter J. Haas  Almaden Research Center, IBM Research Division
Joseph M. Hellerstein  Computer Science Division, University of California, Berkeley
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 161,   Citation Count: 70
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/304182.304208
What is a DOI?

ABSTRACT

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support applications. Traditional offline join algorithms are designed to minimize the time to completion of the query. In contrast, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. Ripple joins are adaptive, adjusting their behavior during processing in accordance with the statistical properties of the data. Ripple joins also permit the user to dynamically trade off the two key performance factors of on-line aggregation: the time between successive updates of the running aggregate, and the amount by which the confidence-interval length decreases at each update. We show how ripple joins can be implemented in an existing DBMS using iterators, and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters. In experiments with an initial implementation of our algorithms in the POSTGRES DBMS, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
AS72
 
Bil86
P. Billingsley. Probability and Measure. Wiley, New York, 1986.
 
CGL83
T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing She sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.
DKO+84
GG97
Gra93
 
Haa97
 
HAR99
J. M. Hellerstein, R. Avnur, and V. Raman. informix under CONTROL: Online query processing. Submitted for publication, 1999.
 
HH98
P. J. Haas and J. M. Hellerstein. Join algorithms for online aggregation. IBM Research Report RJ 10126, IBM Almaden Research Center, San Jose, CA, 1998.
HHW97
HN96
HNS94
 
HNSS96
 
Hoe48
W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statist., 19:293-325, 1948.
HOT88
HOT89
 
IBM97
IBM Corporation. IBM DB2 Universal Database Administration Guide, Version 5. North York, Ontario, Canada, 1997.
 
Inf97
Informix Corporation. Informix Universal Server G~ide to SQL: Syntax, Version 9.01. Menlo Park, CA, March 1997.
 
ODT+91
G. Ozsoyoglu, K. Du, A. Tjahjana, W. Hou, and D. Y. Rowland. On estimating COUNT, SUM, and AVERAGE relational algebra queries. In D. Dimitris Karagiannis, editor, Database and Expert Systems Applications, Proceedings of the International Conference in Berlin, Germany, 1991 (DEXA 91), pages 406-412. Springer-Verlag, 1991.
OJ93
 
Olk93
F. Olken. Random Sampling .from Databases. Ph.D. Dissertation, University of California, Berkeley, CA, 1993. Available as Tech. Report LBL-32883, Lawrence Berkeley Laboratories, Berkeley, CA.
 
Ora97
Oracle Corporation. Oracle8 Server SQL Reference, Release 8.0. Redwood Shores, CA, June 1997.
 
Pos98
PostgreSQL Home Page, 1998. http://www.postgre,aql .org.
 
RRH99
 
RSS94
 
WA91

CITED BY  70

Collaborative Colleagues:
Peter J. Haas: colleagues
Joseph M. Hellerstein: colleagues