|
ABSTRACT
Cardinality estimation during query optimization relies on simplifying assumptions that usually do not hold in practice. To diminish the impact of inaccurate estimates during optimization, statistics on query expressions (SITs) have been previously proposed. These statistics help directly model the distribution of tuples on query sub-plans. Past work in statistics on query expressions has exploited view matching technology to harness their benefits. In this paper we argue against such an approach as it overlooks significant opportunities for improvement in cardinality estimations. We then introduce a framework to reason with SITs based on the notion of conditional selectivity. We present a dynamic programming algorithm to efficiently find the most accurate selectivity estimation for given queries, and discuss how such an approach can be incorporated into existing optimizers with a small number of changes. Finally, we demonstrate experimentally that our technique results in superior cardinality estimations than previous approaches with very little overhead.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
3
|
|
 |
4
|
|
 |
5
|
Nicolas Bruno , Surajit Chaudhuri , Luis Gravano, STHoles: a multidimensional workload-aware histogram, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.211-222, May 21-24, 2001, Santa Barbara, California, United States
|
| |
6
|
|
| |
7
|
|
 |
8
|
Amol Deshpande , Minos Garofalakis , Rajeev Rastogi, Independence is good: dependency-based histogram synopses for high-dimensional data, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.199-210, May 21-24, 2001, Santa Barbara, California, United States
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
Lise Getoor , Benjamin Taskar , Daphne Koller, Selectivity estimation using probabilistic models, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.461-472, May 21-24, 2001, Santa Barbara, California, United States
|
| |
13
|
|
 |
14
|
Jonathan Goldstein , Per-Åke Larson, Optimizing queries using materialized views: a practical, scalable solution, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.331-342, May 21-24, 2001, Santa Barbara, California, United States
|
| |
15
|
G. Graefe. The Cascades framework for query optimization. Data Engineering Bulletin, 18(3), 1995.
|
 |
16
|
Dimitrios Gunopulos , George Kollios , Vassilis J. Tsotras , Carlotta Domeniconi, Approximating multi-dimensional aggregate range queries over real attributes, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.463-474, May 15-18, 2000, Dallas, Texas, United States
|
 |
17
|
L. M. Haas , J. C. Freytag , G. M. Lohman , H. Pirahesh, Extensible query processing in starburst, Proceedings of the 1989 ACM SIGMOD international conference on Management of data, p.377-388, June 1989, Portland, Oregon, United States
|
| |
18
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
23
|
|
 |
24
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
25
|
|
| |
26
|
F. Wass, C. Galindo-Legaria, M.-C. Wu, and M. Joshi. Statistics on views. In Proceedings of the 29th International Conference on Very Large Databases (VLDB), 2003.
|
CITED BY 9
|
|
V. Markl , N. Megiddo , M. Kutsch , T. M. Tran , P. Haas , U. Srivastava, Consistently estimating the selectivity of conjuncts of predicates, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
V. Markl , P. J. Haas , M. Kutsch , N. Megiddo , U. Srivastava , T. M. Tran, Consistent selectivity estimation via maximum entropy, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.55-76, January 2007
|
|
|
|
|
|
|
|
|
|
|
|
Pawel Terlecki , Hardik Bati , Cesar Galindo-Legaria , Peter Zabback, Filtered statistics, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|