ACM Home Page
Please provide us with feedback. Feedback
Cardinality estimation using sample views with quality assurance
Full text PdfPdf (582 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data table of contents
Beijing, China
SESSION: Approximate query processing table of contents
Pages: 175 - 186  
Year of Publication: 2007
ISBN:978-1-59593-686-8
Authors
Per-Ake Larson  Microsoft Research, Redmond, WA
Wolfgang Lehner  Dresden Technical University, Dresden, Germany
Jingren Zhou  Microsoft Research, Redmond, WA
Peter Zabback  Microsoft, Redmond, WA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 95,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247480.1247502
What is a DOI?

ABSTRACT

Accurate cardinality estimation is critically important to high-quality query optimization. It is well known that conventional cardinality estimation based on histograms or similar statistics may produce extremely poor estimates in a variety of situations, for example, queries with complex predicates, correlation among columns, or predicates containing user-defined functions. In this paper, we propose a new, general cardinality estimation technique that combines random sampling and materialized view technology to produce accurate estimates even in these situations. As a major innovation, we exploit feedback information from query execution and process control techniques to assure that estimates remain statistically valid when the underlying data changes. Experimental results based on a prototype implementation in Microsoft SQL Server demonstrate the practicality of the approach and illustrate the dramatic effects improved cardinality estimates may have.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Abramowitz and I. Stegun. Handbook of Mathematical Functions. Dover Publications, 1966.
 
2
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. The Aqua approximate query answering system. In SIGMOD, pages 574--576, 1999.
 
3
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In SIGMOD, pages 275--286, 1999.
 
4
L. D. Brown, T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16:101--133, 2001.
 
5
J. Bunge and M. Fitzpatrick. Estimating the number of species: A review. Journal of the American Statistical Association, 88:364--373, 1993.
 
6
A. Chao. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, Theory and Applications, 11:265--270, 1984.
 
7
A. Chao and S. M. Lee. Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87:210--217, 1990.
8
 
9
S. Chaudhuri, R. Motwani, and V. Narasayya. Random sampling for histogram construction: how much is enough? In SIGMOD, pages 436--447, 1998.
 
10
S. Chaudhuri, R. Motwani, and V. Narasayya. On Random Sampling over Joins. In SIGMOD, pages 263--274, 1999.
 
11
C. A. Galindo-Legaria, M. Joshi, F. Waas, and M. C. Wu. Statistics on views. In VLDB, pages 952--962, 2003.
 
12
 
13
14
 
15
 
16
P. J. Haas and L. Stokes. Estimating the number of classes in a finite population. Journal of the American Statistical Association, 93:1475--1587, 1998.
 
17
P. J. Haas and A. N. Swami. Sequential sampling procedures for query size estimation. In SIGMOD, pages 341--350, 1992.
18
19
20
 
21
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD, pages 448--459, 1998.
 
22
N. N. Engineering Statistics Handbook. National Institute of Standards and Technology, http://www.itl.nist.gov/div898/handbook, 2006.
 
23
F. Olken and D. Rotem. Random sampling from database files: A survey. In SSDBM, pages 92--111, 1990.
 
24
 
25
J. K. Ord and G. A. Whitmore. The Poisson-inverse Gaussian distribution as a model for species abundance. Communications in Statistics, 15(3), 1986.
 
26
J. Sauro and J. R. Lewis. Estimation completion rates from small samples using binomial confidence intervals: Comparison and recommendations. In Proc. Human and Ergonomics Society 49th Annual Meeting, pages 2100--2104, 2005.
 
27
A. Shlosser. On estimation of the size of the dictionary of a long text on the basis of a sample. Engineering Cybernetics, 19:97--102, 1981.
 
28
G. B. Wetherill and K. D. Glazebrook. Sequential methods in statistics, 3rd ed. Chapman & Hall, 1986.
 
29
E. Wilson. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22:209--212, 1927.
 
30
D. Zelterman. Robust estimation in truncated discrete distributions with applications to capture-recapture experiments. Journal of Statistical Planning and Inference, 18:225--237, 1988.


Collaborative Colleagues:
Per-Ake Larson: colleagues
Wolfgang Lehner: colleagues
Jingren Zhou: colleagues
Peter Zabback: colleagues