|
ABSTRACT
Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm, in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide “sanity bounds” to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
BDT83
|
|
 |
Chr83a
|
|
| |
Chr83b
|
S Chrxstodoulakxs Estxmatmg record selectlvltles Informatzon Systems, 8(2) 69- 79, 1983
|
| |
Dem80
|
R Demolombe Estimation of the number of tuples satxsfymg a query expressed in predicate calculus language In Proc Szzth VLDB, pages 55-63, 1980
|
 |
Fed84
|
|
| |
Fel68
|
W Feller An Introduction to Probability Theory and Its Applzcatsons, volume 1 John Wiley and Sons, inc, New York, New York, 1968
|
 |
HOT88
|
|
 |
HOT89
|
|
 |
HTY82
|
|
 |
KK85
|
|
| |
Koo80
|
P~ Kool The optzm~zat~on of quemes zn relatzonal database systems PhD thesis, Case Western Umverslty, Cleveland, Ohio, 1980
|
| |
LN89
|
|
 |
LN90
|
|
| |
LNS90
|
1~ Lipton and J Naughton and D Schneider Practical Selectlwty Estimation through Adaptive Samphng Umverslty of Wisconsin-Madison Computer Sciences Department Techmcal Report, March 1990
|
| |
Lyn88
|
|
 |
MCS88
|
|
 |
MD88
|
|
| |
MDL83
|
A Montgomery, D D'Souza, and S Lee The cost of relatmnal algebraic operatmns on skewed data Estimates and experiments Information Processing Letters, pages 235-241, 1983
|
| |
MK85
|
B Muthuswamy and G Kerschberg A DDSM for relatmnal query optlmlzatmn Techmcal report, Umverslty of South Carohna, Columbia, 1985 As cited in {MCS88}
|
| |
OR86
|
|
| |
OR89
|
|
 |
PSC84
|
|
 |
SAC+79
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
CITED BY 85
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sanjeev Khanna , S. Muthukrishnan , Mike Paterson, On approximating rectangle tiling and packing, Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, p.384-393, January 25-27, 1998, San Francisco, California, United States
|
|
|
Peter J. Haas , Jeffrey F. Naughton , S. Seshadri , Arun N. Swami, Fixed-precision estimation of join selectivity, Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.190-201, May 25-28, 1993, Washington, D.C., United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jeffrey Scott Vitter , Min Wang , Bala Iyer, Data cube approximation and histograms via wavelets, Proceedings of the seventh international conference on Information and knowledge management, p.96-104, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
Peter J. Haas , Jeffrey F. Naughton , Arun N. Swami, On the relative cost of sampling for join selectivity estimation, Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.14-24, May 24-27, 1994, Minneapolis, Minnesota, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jason Tsong-Li Wang , Gung-Wei Chirn , Thomas G. Marr , Bruce Shapiro , Dennis Shasha , Kaizhong Zhang, Combinatorial pattern discovery for scientific data: some preliminary results, ACM SIGMOD Record, v.23 n.2, p.115-125, June 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kamil Saraç , Ömer Eğecioǧlu , Amr El Abbadi, Iterated DFT based techniques for join size estimation, Proceedings of the seventh international conference on Information and knowledge management, p.348-355, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
Brian Dunkel , Qiang Zhu , Wing Lau , Suyun Chen, Multiple-granularity interleaving for piggyback query processing, Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research, p.2, November 08-11, 1999, Mississauga, Ontario, Canada
|
|
|
Qiang Zhu , Brian Dunkel , Nandit Soparkar , Suyun Chen , Berni Schiefer , Tony Lai, A piggyback method to collect statistics for query optimization in database management systems, Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research, p.25, November 30-December 03, 1998, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlo Dell'aquila , Ezio Lefons , Filippo Tangorra, Analytic-based estimation of query result sizes, Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases, p.1-7, February 13-15, 2005, Salzburg, Austria
|
|
|
|
|
|
Jizhou Luo , Xiaofang Zhou , Yu Zhang , Heng Tao Shen , Jianzhong Li, Selectivity estimation by batch-query based histogram and parametric method, Proceedings of the eighteenth conference on Australasian database, p.93-102, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
Xuemin Lin , Qing Liu , Yidong Yuan , Xiaofang Zhou, Multiscale histograms: summarizing topological relations in large spatial datasets, Proceedings of the 29th international conference on Very large data bases, p.814-825, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Feng Yan , Wen-Chi Hou , Zhewei Jiang , Cheng Luo , Qiang Zhu, Selectivity estimation of range queries based on data density approximation via cosine series, Data & Knowledge Engineering, v.63 n.3, p.855-878, December, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cheng Luo , Zhewei Jiang , Wen-Chi Hou , Feng Yu , Qiang Zhu, A sampling approach for XML query selectivity estimation, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|