|
ABSTRACT
Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
AAD+96
|
Sameet Agarwal , Rakesh Agrawal , Prasad Deshpande , Ashish Gupta , Jeffrey F. Naughton , Raghu Ramakrishnan , Sunita Sarawagi, On the Computation of Multidimensional Aggregates, Proceedings of the 22th International Conference on Very Large Data Bases, p.506-521, September 03-06, 1996
|
| |
ACSW96
|
|
| |
AZ96
|
|
| |
Bil86
|
P. Billingsley. Probability and Measure. Wiley, New York, second edition, 1986.
|
 |
BM96
|
Roberto J. Bayardo, Jr. , Daniel P. Miranker, Processing queries for first-few answers, Proceedings of the fifth international conference on Information and knowledge management, p.45-52, November 12-16, 1996, Rockville, Maryland, United States
[doi> 10.1145/238355.238372]
|
| |
Bra84
|
|
| |
CCS93
|
E. F. Codd, S. B. Codd, and C. T. Sal- Icy. Providing OLAP (on-line analyticalprocessing) to user-analysts: an IT mandate.URL http://www.arborsoft.com/papers/coddTOC.htmI., I993.
|
| |
CGL83
|
T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing the sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.
|
| |
CS96
|
|
 |
DKO+84
|
David J DeWitt , Randy H Katz , Frank Olken , Leonard D Shapiro , Michael R Stonebraker , David Wood, Implementation techniques for main memory database systems, Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts
|
| |
GBLP96
|
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proceedings of the Twelfth International Conference on Data Engineering, p.152-159, February 26-March 01, 1996
|
| |
GHQ95
|
|
| |
Haa96a
|
P. J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. IBM Research Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.
|
| |
Haa96b
|
P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. IBM Research Report RJ 10050, IBM Almaden Research Center, San Jose, CA, 1996.
|
 |
HN96
|
|
| |
HNSS96
|
|
 |
HOD91
|
Wen-Chi Hou , Gultekin Ozsoyoglu , Erdogan Dogdu, Error-constrained COUNT query evaluation in relational databases, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.278-287, May 29-31, 1991, Denver, Colorado, United States
|
| |
Hoe63
|
W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13- 30, 1963.
|
 |
HOT88
|
|
 |
HOT89
|
|
| |
LNSS93
|
|
 |
Mye85
|
|
| |
Olk93
|
F. Olken. Random Sampling from Databases. PhD thesis, University of California, Berkeley, 1993.
|
| |
Pos95
|
Postgres95 home page,1995. UFI.L http://www, ki.net / postgres95.
|
 |
SAC+79
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
 |
SHP+96
|
Praveen Seshadri , Joseph M. Hellerstein , Hamid Pirahesh , T. Y. Cliff Leung , Raghu Ramakrishnan , Divesh Srivastava , Peter J. Stuckey , S. Sudarshan, Cost-based optimization for magic: algebra and implementation, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.435-446, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
SPL96
|
|
| |
VU92
|
|
| |
VL93
|
|
| |
WA91
|
|
| |
YL95
|
|
CITED BY 161
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ron Avnur , Joseph M. Hellerstein , Bruce Lo , Chris Olston , Bhaskaran Raman , Vijayshankar Raman , Tali Roth , Kirk Wylie, CONTROL: continuous output and navigation technology with refinement on-line, ACM SIGMOD Record, v.27 n.2, p.567-569, June 1998
|
|
|
Jeffrey Scott Vitter , Min Wang , Bala Iyer, Data cube approximation and histograms via wavelets, Proceedings of the seventh international conference on Information and knowledge management, p.96-104, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Anna Fredrikson , Chris North , Catherine Plaisant , Ben Shneiderman, Temporal, geographical and categorical aggregations viewed through coordinated displays: a case study with highway incident data, Proceedings of the 1999 workshop on new paradigms in information visualization and manipulation in conjunction with the eighth ACM internation conference on Information and knowledge management, p.26-34, November 02-06, 1999, Kansas City, Missouri, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonidas Fegaras , David Levine , Sujoe Bose , Vamsi Chaluvadi, Query processing of streamed XML data, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Marin J. Strauss, Optimal and approximate computation of summary statistics for range aggregates, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.227-236, May 2001, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yannis Sismanis , Antonios Deligiannakis , Yannis Kotidis , Nick Roussopoulos, Hierarchical dwarfs for the rollup cube, Proceedings of the 6th ACM international workshop on Data warehousing and OLAP, November 07-07, 2003, New Orleans, Louisiana, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Joseph M. Hellerstein , Ron Avnur , Andy Chou , Christian Hidber , Chris Olston , Vijayshankar Raman , Tali Roth , Peter J. Haas, Interactive Data Analysis: The Control Project, Computer, v.32 n.8, p.51-59, August 1999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
Edith Cohen , Mayur Datar , Shinji Fujiwara , Aristides Gionis , Piotr Indyk , Rajeev Motwani , Jeffrey D. Ullman , Cheng Yang, Finding Interesting Associations without Support Pruning, IEEE Transactions on Knowledge and Data Engineering, v.13 n.1, p.64-78, January 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher Jermaine , Alin Dobra , Subramanian Arumugam , Shantanu Joshi , Abhijit Pol, A disk-based join with probabilistic guarantees, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup, Sketching unaggregated data streams for subpopulation-size queries, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
Amol Deshpande , Carlos Guestrin , Samuel R. Madden , Joseph M. Hellerstein , Wei Hong, Model-driven data acquisition in sensor networks, Proceedings of the Thirtieth international conference on Very large data bases, p.588-599, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jens-Peter Dittrich , Bernhard Seeger , David Scot Taylor , Peter Widmayer, Progressive merge join: a generic and non-blocking sort-based join algorithm, Proceedings of the 28th international conference on Very Large Data Bases, p.299-310, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
Don Carney , Uǧur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Greg Seidman , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Monitoring streams: a new class of data management applications, Proceedings of the 28th international conference on Very Large Data Bases, p.215-226, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jochen Van den Bercken , Björn Blohsfeld , Jens-Peter Dittrich , Jürgen Krämer , Tobias Schäfer , Martin Schneider , Bernhard Seeger, XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries, Proceedings of the 27th International Conference on Very Large Data Bases, p.39-48, September 11-14, 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Florin Rusu , Fei Xu , Luis Leopoldo Perez , Mingxi Wu , Ravi Jampani , Chris Jermaine , Alin Dobra, The DBO database system, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
Ravi Jampani , Fei Xu , Mingxi Wu , Luis Leopoldo Perez , Christopher Jermaine , Peter J. Haas, MCDB: a monte carlo approach to managing uncertain data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
Eric Lo , Ben Kao , Wai-Shing Ho , Sau Dan Lee , Chun Kit Chui , David W. Cheung, OLAP on sequence data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup, Algorithms and estimators for accurate summarization of internet traffic, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, October 24-26, 2007, San Diego, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Katja Hose , Armin Roth , André Zeitz , Kai-Uwe Sattler , Felix Naumann, A research agenda for query processing in large-scale peer data management systems, Information Systems, v.33 n.7-8, p.597-610, November, 2008
|
|
|
|
|
|
|
|
|
Ramakrishna Varadarajan , Vagelis Hristidis , Louiqa Raschid , Maria-Esther Vidal , Luis Ibáñez , Héctor Rodríguez-Drumond, Flexible and efficient querying and ranking on hyperlinked data sources, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Fan , Haixun Wang , Philip S. Yu , Shaw-Hwa Lo, Inductive learning in less than one sequential data scan, Proceedings of the 18th international joint conference on Artificial intelligence, p.595-600, August 09-15, 2003, Acapulco, Mexico
|
|