ACM Home Page
Please provide us with feedback. Feedback
On the relative cost of sampling for join selectivity estimation
Full text PdfPdf (947 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems table of contents
Minneapolis, Minnesota, United States
Pages: 14 - 24  
Year of Publication: 1994
ISBN:0-89791-642-5
Authors
Peter J. Haas  IBM Almaden Research Center
Jeffrey F. Naughton  University of Wisconsin - Madison
Arun N. Swami  IBM Almaden Research Center
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 33,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/182591.182594
What is a DOI?

ABSTRACT

We compare the cost of estimating the selectivity of a “star join” using sampling procedure t-cross to the cost of simply computing the join and obtaining the exact answer. Our bounds and approximations for the relative cost of sampling show how this cost depends on the size of the input relations, the number of input relations, and the precision criterion used by the estimation procedure. We also demonstrate the deleterious effect of dangling tuples and the mixed effect of data skew on the relative cost of sampling. These results provide insight into when sampling should or should not be used for join selectivity estimation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
DNS92
HN+93a
 
HN+93b
Haas, P. J., Naughton, J. F., Seshadri, S., and Swami, A. N. (1993). Selectivity and cost estimation for joins based on random sampling. Technical Report RJ 9577. IBM Almaden Research Center. San Jose, CA.
HS92a
HS92b
HOT88
HOT89
HOD91
 
LN89
LN90
LNS90
 
OR86
 
OR89
ORX90
 
Olk93
Olken, F. (1993). Random Sampling from Databases. Ph.D. Dissertation. Department of Computer Science. University of California at Berkeley. Berkeley, California.
 
Ses92

CITED BY  11

Collaborative Colleagues:
Peter J. Haas: colleagues
Jeffrey F. Naughton: colleagues
Arun N. Swami: colleagues