|
ABSTRACT
Accurate cardinality estimation is critically important to high-quality query optimization. It is well known that conventional cardinality estimation based on histograms or similar statistics may produce extremely poor estimates in a variety of situations, for example, queries with complex predicates, correlation among columns, or predicates containing user-defined functions. In this paper, we propose a new, general cardinality estimation technique that combines random sampling and materialized view technology to produce accurate estimates even in these situations. As a major innovation, we exploit feedback information from query execution and process control techniques to assure that estimates remain statistically valid when the underlying data changes. Experimental results based on a prototype implementation in Microsoft SQL Server demonstrate the practicality of the approach and illustrate the dramatic effects improved cardinality estimates may have.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Abramowitz and I. Stegun. Handbook of Mathematical Functions. Dover Publications, 1966.
|
| |
2
|
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. The Aqua approximate query answering system. In SIGMOD, pages 574--576, 1999.
|
| |
3
|
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In SIGMOD, pages 275--286, 1999.
|
| |
4
|
L. D. Brown, T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16:101--133, 2001.
|
| |
5
|
J. Bunge and M. Fitzpatrick. Estimating the number of species: A review. Journal of the American Statistical Association, 88:364--373, 1993.
|
| |
6
|
A. Chao. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, Theory and Applications, 11:265--270, 1984.
|
| |
7
|
A. Chao and S. M. Lee. Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87:210--217, 1990.
|
 |
8
|
Moses Charikar , Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, Towards estimation error guarantees for distinct values, Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.268-279, May 15-18, 2000, Dallas, Texas, United States
[doi> 10.1145/335168.335230]
|
| |
9
|
S. Chaudhuri, R. Motwani, and V. Narasayya. Random sampling for histogram construction: how much is enough? In SIGMOD, pages 436--447, 1998.
|
| |
10
|
S. Chaudhuri, R. Motwani, and V. Narasayya. On Random Sampling over Joins. In SIGMOD, pages 263--274, 1999.
|
| |
11
|
C. A. Galindo-Legaria, M. Joshi, F. Waas, and M. C. Wu. Statistics on views. In VLDB, pages 952--962, 2003.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
P. J. Haas and L. Stokes. Estimating the number of classes in a finite population. Journal of the American Statistical Association, 93:1475--1587, 1998.
|
| |
17
|
P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In SIGMOD, pages 341--350, 1992.
|
 |
18
|
Ihab F. Ilyas , Volker Markl , Peter Haas , Paul Brown , Ashraf Aboulnaga, CORDS: automatic discovery of correlations and soft functional dependencies, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007641]
|
 |
19
|
|
 |
20
|
Volker Markl , Vijayshankar Raman , David Simmen , Guy Lohman , Hamid Pirahesh , Miso Cilimdzic, Robust query processing through progressive optimization, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007642]
|
| |
21
|
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD, pages 448--459, 1998.
|
| |
22
|
N. N. Engineering Statistics Handbook. National Institute of Standards and Technology, http://www.itl.nist.gov/div898/handbook, 2006.
|
| |
23
|
F. Olken and D. Rotem. Random sampling from database files: A survey. In SSDBM, pages 92--111, 1990.
|
| |
24
|
|
| |
25
|
J. K. Ord and G. A. Whitmore. The Poisson-inverse Gaussian distribution as a model for species abundance. Communications in Statistics, 15(3), 1986.
|
| |
26
|
J. Sauro and J. R. Lewis. Estimation completion rates from small samples using binomial confidence intervals: Comparison and recommendations. In Proc. Human and Ergonomics Society 49th Annual Meeting, pages 2100--2104, 2005.
|
| |
27
|
A. Shlosser. On estimation of the size of the dictionary of a long text on the basis of a sample. Engineering Cybernetics, 19:97--102, 1981.
|
| |
28
|
G. B. Wetherill and K. D. Glazebrook. Sequential methods in statistics, 3rd ed. Chapman & Hall, 1986.
|
| |
29
|
E. Wilson. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22:209--212, 1927.
|
| |
30
|
D. Zelterman. Robust estimation in truncated discrete distributions with applications to capture-recapture experiments. Journal of Statistical Planning and Inference, 18:225--237, 1988.
|
|