| Balancing histogram optimality and practicality for query result size estimation |
| Full text |
Pdf
(1.37 MB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 1995 ACM SIGMOD international conference on Management of data
table of contents
San Jose, California, United States
Pages: 233 - 244
Year of Publication: 1995
ISBN:0-89791-731-6
Also published in ...
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 81, Citation Count: 84
|
|
|
ABSTRACT
Many current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to balance between two conflicting goals: optimality, so that generated estimates have the least error, and practicality, so that histograms can be constructed and maintained efficiently. In this paper, we present both theoretical and experimental results on several issues related to this trade-off. Our overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
S. Christodoulakis On the estimation and use of selectivities in database performance evaluation. Research Report CS-89-24, Dept. of Computer Science, University of Waterloo, June 1989.
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
Y. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. Unpublished manuscript, February 1995.
|
 |
13
|
|
| |
14
|
|
 |
15
|
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider, Practical selectivity estimation through adaptive sampling, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.1-11, May 23-26, 1990, Atlantic City, New Jersey, United States
|
 |
16
|
|
| |
17
|
A. W. Marshall and I. Olkin. Inequalities: Theory of Majorizat~on and Its Applications. Academic Press, New York, NY, 1979.
|
| |
18
|
T. H. Merrett and E. Otoo. Distribution models of relations. In Proc. 5th Int. VLDB Conf., pages 418- 425, Rio de Janeiro, Brazil, October 1979.
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
 |
23
|
|
| |
24
|
G. K. Zipf. Human Behavior and the Pmnc~ple of Least Effort. Addison-Wesley, Reading, MA, 1949.
|
CITED BY 84
|
|
H. V. Jagadish , Raymond T. Ng , Divesh Srivastava, Substring selectivity estimation, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.249-260, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
Sanjeev Khanna , S. Muthukrishnan , Mike Paterson, On approximating rectangle tiling and packing, Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, p.384-393, January 25-27, 1998, San Francisco, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kamil Saraç , Ömer Eğecioǧlu , Amr El Abbadi, Iterated DFT based techniques for join size estimation, Proceedings of the seventh international conference on Information and knowledge management, p.348-355, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nick Koudas , S. Muthukrishnan , Divesh Srivastava, Optimal histograms for hierarchical range queries (extended abstract), Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.196-204, May 15-18, 2000, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zina Ben Miled , Jin Liu , Omran Bukhres , Huian Li , Jesse Martin , Chavali Balagopalakrishna , Robert Oppelt, Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository, Journal of Intelligent Information Systems, v.23 n.2, p.145-178, September 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlo Dell'aquila , Ezio Lefons , Filippo Tangorra, Analytic-based estimation of query result sizes, Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases, p.1-7, February 13-15, 2005, Salzburg, Austria
|
|
|
Jizhou Luo , Xiaofang Zhou , Yu Zhang , Heng Tao Shen , Jianzhong Li, Selectivity estimation by batch-query based histogram and parametric method, Proceedings of the eighteenth conference on Australasian database, p.93-102, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|