| Join synopses for approximate query answering |
| Full text |
Pdf
(1.54 MB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 1999 ACM SIGMOD international conference on Management of data
table of contents
Philadelphia, Pennsylvania, United States
Pages: 275 - 286
Year of Publication: 1999
ISBN:1-58113-084-8
Also published in ...
|
|
Authors
|
|
Swarup Acharya
|
Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
|
|
Phillip B. Gibbons
|
Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
|
|
Viswanath Poosala
|
Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
|
|
Sridhar Ramaswamy
|
Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 84, Citation Count: 75
|
|
|
ABSTRACT
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for join-queries using only statistics (in particular, samples) from the base relations. We propose join synopses as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins. We present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known. We also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations. Our extensive set of experiments on the TPC-D benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
AGPR99a
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, The Aqua approximate query answering system, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.574-576, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
AGPR99b
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
AMS96
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
APR99
|
Swarup Acharya , Viswanath Poosala , Sridhar Ramaswamy, Selectivity estimation in spatial databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.13-24, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
BDF+97
|
l). Barbarfi, W. DuMouchel, C. Faloutsos, E J. Haas, J. M. HeUerstein, Y. Ioannidis, H. V. Jagadish, T. Johnson, R. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4):3-45, 1997.
|
 |
CR94
|
|
 |
GGMS96
|
Sumit Ganguly , Phillip B. Gibbons , Yossi Matias , Avi Silberschatz, Bifocal sampling for skew-resistant join size estimation, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.271-281, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
GM98
|
|
| |
GM99a
|
E B. Gibbons and Y. Matins. Selecting estimation proce.- dures and bounds for approximate answering of aggregation queries. Technical report, Bell Laboratories, Murray Hill, New Jersey, 1999.
|
| |
GM99b
|
E B. Gibbons and Y. Matins. Synopsis data structures for mas.. sive data sets. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, 1999. To appear. Available,. as Bell Labs tech. rep., Sept. 1998, and at http://www.belllabs. corn/- pbgibbons/.
|
| |
GMP97a
|
E B. Gibbons, Y. Matias, and V. Poosala. Aqua project white paper. Technical report, Bell Laboratories, Murray Hill, New Jersey, December 1997.
|
| |
GMP97b
|
|
| |
Haa96
|
P.J. Haas. Hoeffding inequalities for join-selectivity estimation and online aggregation. Technical Report RJ 10040, IBM Almaden Research Center, San Jose, CA, 1996.
|
| |
Haa97
|
|
 |
HHW97
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
 |
HNS94
|
Peter J. Haas , Jeffrey F. Naughton , Arun N. Swami, On the relative cost of sampling for join selectivity estimation, Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.14-24, May 24-27, 1994, Minneapolis, Minnesota, United States
[doi> 10.1145/182591.182594]
|
| |
HNSS95
|
|
 |
HÖT88
|
|
| |
Koo80
|
|
| |
LN95
|
|
 |
LNS90
|
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider, Practical selectivity estimation through adaptive sampling, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.1-11, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
OR92
|
|
 |
PIHS96
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
Poo97
|
|
 |
SAC+79
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
Sch97
|
D. Schneider. The ins & outs (and everything in between) of data warehousing. Tutorial in the 23rd International Conf. on Very Large Data Bases, August 1997.
|
| |
VL93
|
|
CITED BY 75
|
|
|
|
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
Dimuthu Makawita , Kian-Lee Tan , Huan Liu, Sampling from databases using B+-trees, Proceedings of the ninth international conference on Information and knowledge management, p.158-164, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yufei Tao , Man Lung Yiu , Dimitris Papadias , Marios Hadjieleftheriou , Nikos Mamoulis, RPJ: producing fast join results on streams through rate-based optimization, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
B. Lee , T. Critchlow , G. Abdulla , C. Baldwin , R. Kamimura , R. Musick , R. Snapp , N. Tang, The framework for approximate queries on simulation data, Information Sciences—Informatics and Computer Science: An International Journal, v.157 n.1-2, p.3-20, December 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
Bin Cui , Heng Tao Shen , Jialie Shen , Kian-Lee Tan, Exploring bit-difference for approximate KNN search in high-dimensional databases, Proceedings of the sixteenth Australasian database conference, p.165-174, January 01, 2005, Newcastle, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|