| An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment |
| Full text |
Pdf
(1.05 MB)
|
| Source
|
International Conference on Management of Data
archive
Proceedings of the 1993 ACM SIGMOD international conference on Management of data
table of contents
Washington, D.C., United States
Pages: 79 - 88
Year of Publication: 1993
ISBN:0-89791-592-5
Also published in ...
|
|
Authors
|
|
Wei Sun
|
School of Computer Science, Florida International University, Miami, Florida
|
|
Yibei Ling
|
School of Computer Science, Florida International University, Miami, Florida
|
|
Naphtali Rishe
|
School of Computer Science, Florida International University, Miami, Florida
|
|
Yi Deng
|
School of Computer Science, Florida International University, Miami, Florida
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 24, Citation Count: 21
|
|
|
ABSTRACT
This paper proposes a novel strategy for estimating the size of the resulting relation after an equi-join and selection using a regression model. An approximating series representing the underlying data distribution and dependency is derived from the actual data. The proposed method provides an instant and accurate size estimation by performing an evaluation of the series, with no run-time overheads in page faults and space, and with negligible CPU overhead. In contrast, the popular sampling methods incur run-time overheads in page faults (for sampling), CPU time and space. These overheads of sampling methods increase the response time of processing a query. The results of a comprehensive experimental study are also reported, which demonstrate that the estimation accuracy by the proposed method is comparable with that of the sampling methods which are believed to provide the most accurate estimation. The proposed method seems ideal for retrieval-intensive database and information systems. Since the overheads involved in deriving the approximating series are fairly moderate, we believe that this method is also an extremely competent method when moderate or periodical updates are present.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ahrens, J.H. and Dieter, U., "Extensions of Forsythe's Method for Random Sampling from the Normal Distribution". Math. Compu., 27, 124 (Oct. 1973), pp. 92%937.
|
| |
2
|
Christodoulakis, S. "Estimating Record Selectivities", Inf. Syst. 8, 2 1983, pp. 105-115.
|
 |
3
|
|
 |
4
|
|
| |
5
|
Gerard P. Weeg, Georgia B. Reed, "Introduction to Numerical Analysis", Blaisdell Publishing Company, 1966, pp. 63-72.
|
 |
6
|
|
 |
7
|
Wen-Chi Hou , Gultekin Ozsoyoglu , Erdogan Dogdu, Error-constrained COUNT query evaluation in relational databases, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.278-287, May 29-31, 1991, Denver, Colorado, United States
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
Marvin J. Karson, "Multivariate Statistical Methods", The Iowa State University Press, 1982.
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
 |
17
|
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider, Practical selectivity estimation through adaptive sampling, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.1-11, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
18
|
Luk, W. S. and Black, P. A., "On Cost Estimation in Processing a Query in a Distributed Database System'# Proc. of the IEEE 5th COMSAC, Chicago, IL, Nov. 1981, pp. 24-32.
|
| |
19
|
|
| |
20
|
M. :I. Maron, "Numerical Analysis# A practical approach", Macmillan Publishing Company, 1987.
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
25
|
|
| |
26
|
Wolf, 3., Dias, D., Yu, P., and Turek, J., "A Parallel Hash-:loin Algorithm for Managing Data Skew", Tech Report RC 16489, IBM Watson Center, 1991.
|
 |
27
|
|
 |
28
|
|
CITED BY 21
|
|
|
|
|
|
|
|
Sanjeev Khanna , S. Muthukrishnan , Mike Paterson, On approximating rectangle tiling and packing, Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, p.384-393, January 25-27, 1998, San Francisco, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlo Dell'aquila , Ezio Lefons , Filippo Tangorra, Analytic-based estimation of query result sizes, Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases, p.1-7, February 13-15, 2005, Salzburg, Austria
|
|
|
|
|
|
Jizhou Luo , Xiaofang Zhou , Yu Zhang , Heng Tao Shen , Jianzhong Li, Selectivity estimation by batch-query based histogram and parametric method, Proceedings of the eighteenth conference on Australasian database, p.93-102, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Feng Yan , Wen-Chi Hou , Zhewei Jiang , Cheng Luo , Qiang Zhu, Selectivity estimation of range queries based on data density approximation via cosine series, Data & Knowledge Engineering, v.63 n.3, p.855-878, December, 2007
|
|