|
ABSTRACT
We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support applications.
Traditional offline join algorithms are designed to minimize the time to completion of the query. In contrast, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. Ripple joins are adaptive, adjusting their behavior during processing in accordance with the statistical properties of the data. Ripple joins also permit the user to dynamically trade off the two key performance factors of on-line aggregation: the time between successive updates of the running aggregate, and the amount by which the confidence-interval length decreases at each update. We show how ripple joins can be implemented in an existing DBMS using iterators, and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters. In experiments with an initial implementation of our algorithms in the POSTGRES DBMS, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
AS72
|
|
| |
Bil86
|
P. Billingsley. Probability and Measure. Wiley, New York, 1986.
|
| |
CGL83
|
T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing She sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.
|
 |
DKO+84
|
David J DeWitt , Randy H Katz , Frank Olken , Leonard D Shapiro , Michael R Stonebraker , David Wood, Implementation techniques for main memory database systems, Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts
|
 |
GG97
|
|
 |
Gra93
|
|
| |
Haa97
|
|
| |
HAR99
|
J. M. Hellerstein, R. Avnur, and V. Raman. informix under CONTROL: Online query processing. Submitted for publication, 1999.
|
| |
HH98
|
P. J. Haas and J. M. Hellerstein. Join algorithms for online aggregation. IBM Research Report RJ 10126, IBM Almaden Research Center, San Jose, CA, 1998.
|
 |
HHW97
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
 |
HN96
|
|
 |
HNS94
|
Peter J. Haas , Jeffrey F. Naughton , Arun N. Swami, On the relative cost of sampling for join selectivity estimation, Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.14-24, May 24-27, 1994, Minneapolis, Minnesota, United States
[doi> 10.1145/182591.182594]
|
| |
HNSS96
|
|
| |
Hoe48
|
W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statist., 19:293-325, 1948.
|
 |
HOT88
|
|
 |
HOT89
|
|
| |
IBM97
|
IBM Corporation. IBM DB2 Universal Database Administration Guide, Version 5. North York, Ontario, Canada, 1997.
|
| |
Inf97
|
Informix Corporation. Informix Universal Server G~ide to SQL: Syntax, Version 9.01. Menlo Park, CA, March 1997.
|
| |
ODT+91
|
G. Ozsoyoglu, K. Du, A. Tjahjana, W. Hou, and D. Y. Rowland. On estimating COUNT, SUM, and AVERAGE relational algebra queries. In D. Dimitris Karagiannis, editor, Database and Expert Systems Applications, Proceedings of the International Conference in Berlin, Germany, 1991 (DEXA 91), pages 406-412. Springer-Verlag, 1991.
|
 |
OJ93
|
|
| |
Olk93
|
F. Olken. Random Sampling .from Databases. Ph.D. Dissertation, University of California, Berkeley, CA, 1993. Available as Tech. Report LBL-32883, Lawrence Berkeley Laboratories, Berkeley, CA.
|
| |
Ora97
|
Oracle Corporation. Oracle8 Server SQL Reference, Release 8.0. Redwood Shores, CA, June 1997.
|
| |
Pos98
|
PostgreSQL Home Page, 1998. http://www.postgre,aql .org.
|
| |
RRH99
|
|
| |
RSS94
|
|
| |
WA91
|
|
CITED BY 70
|
|
|
|
|
|
|
|
Vladimir Zadorozhny , Louiqa Raschid , Maria Esther Vidal , Tolga Urhan , Laura Bright, Efficient evaluation of queries in a mediator for WebSources, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
Joseph M. Hellerstein , Ron Avnur , Andy Chou , Christian Hidber , Chris Olston , Vijayshankar Raman , Tali Roth , Peter J. Haas, Interactive Data Analysis: The Control Project, Computer, v.32 n.8, p.51-59, August 1999
|
|
|
|
|
|
Ihab F. Ilyas , Rahul Shah , Walid G. Aref , Jeffrey Scott Vitter , Ahmed K. Elmagarmid, Rank-aware query optimization, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jens-Peter Dittrich , Bernhard Seeger , David Scot Taylor , Peter Widmayer, On producing join results early, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.134-142, June 09-11, 2003, San Diego, California
|
|
|
Christopher Jermaine , Alin Dobra , Subramanian Arumugam , Shantanu Joshi , Abhijit Pol, A disk-based join with probabilistic guarantees, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
Ihab F. Ilyas , Walid G. Aref , Ahmed K. Elmagarmid , Hicham G. Elmongui , Rahul Shah , Jeffrey Scott Vitter, Adaptive rank-aware query optimization in relational databases, ACM Transactions on Database Systems (TODS), v.31 n.4, p.1257-1304, December 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Moustafa A. Hammad , Michael J. Franklin , Walid G. Aref , Ahmed K. Elmagarmid, Scheduling for shared window joins over data streams, Proceedings of the 29th international conference on Very large data bases, p.297-308, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
Jens-Peter Dittrich , Bernhard Seeger , David Scot Taylor , Peter Widmayer, Progressive merge join: a generic and non-blocking sort-based join algorithm, Proceedings of the 28th international conference on Very Large Data Bases, p.299-310, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Florin Rusu , Fei Xu , Luis Leopoldo Perez , Mingxi Wu , Ravi Jampani , Chris Jermaine , Alin Dobra, The DBO database system, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Katja Hose , Armin Roth , André Zeitz , Kai-Uwe Sattler , Felix Naumann, A research agenda for query processing in large-scale peer data management systems, Information Systems, v.33 n.7-8, p.597-610, November, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|