ACM Home Page
Please provide us with feedback. Feedback
An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment
Full text PdfPdf (1.05 MB)
Source International Conference on Management of Data archive
Proceedings of the 1993 ACM SIGMOD international conference on Management of data table of contents
Washington, D.C., United States
Pages: 79 - 88  
Year of Publication: 1993
ISBN:0-89791-592-5
Also published in ...
Authors
Wei Sun  School of Computer Science, Florida International University, Miami, Florida
Yibei Ling  School of Computer Science, Florida International University, Miami, Florida
Naphtali Rishe  School of Computer Science, Florida International University, Miami, Florida
Yi Deng  School of Computer Science, Florida International University, Miami, Florida
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 24,   Citation Count: 21
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/170035.170055
What is a DOI?

ABSTRACT

This paper proposes a novel strategy for estimating the size of the resulting relation after an equi-join and selection using a regression model. An approximating series representing the underlying data distribution and dependency is derived from the actual data. The proposed method provides an instant and accurate size estimation by performing an evaluation of the series, with no run-time overheads in page faults and space, and with negligible CPU overhead. In contrast, the popular sampling methods incur run-time overheads in page faults (for sampling), CPU time and space. These overheads of sampling methods increase the response time of processing a query. The results of a comprehensive experimental study are also reported, which demonstrate that the estimation accuracy by the proposed method is comparable with that of the sampling methods which are believed to provide the most accurate estimation. The proposed method seems ideal for retrieval-intensive database and information systems. Since the overheads involved in deriving the approximating series are fairly moderate, we believe that this method is also an extremely competent method when moderate or periodical updates are present.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Ahrens, J.H. and Dieter, U., "Extensions of Forsythe's Method for Random Sampling from the Normal Distribution". Math. Compu., 27, 124 (Oct. 1973), pp. 92%937.
 
2
Christodoulakis, S. "Estimating Record Selectivities", Inf. Syst. 8, 2 1983, pp. 105-115.
3
4
 
5
Gerard P. Weeg, Georgia B. Reed, "Introduction to Numerical Analysis", Blaisdell Publishing Company, 1966, pp. 63-72.
6
7
8
9
10
11
 
12
Marvin J. Karson, "Multivariate Statistical Methods", The Iowa State University Press, 1982.
 
13
 
14
15
16
17
 
18
Luk, W. S. and Black, P. A., "On Cost Estimation in Processing a Query in a Distributed Database System'# Proc. of the IEEE 5th COMSAC, Chicago, IL, Nov. 1981, pp. 24-32.
 
19
 
20
M. :I. Maron, "Numerical Analysis# A practical approach", Macmillan Publishing Company, 1987.
21
22
 
23
24
 
25
 
26
Wolf, 3., Dias, D., Yu, P., and Turek, J., "A Parallel Hash-:loin Algorithm for Managing Data Skew", Tech Report RC 16489, IBM Watson Center, 1991.
27
28

CITED BY  21

Collaborative Colleagues:
Wei Sun: colleagues
Yibei Ling: colleagues
Naphtali Rishe: colleagues
Yi Deng: colleagues