|
ABSTRACT
We compare the cost of estimating the selectivity of a “star join” using sampling procedure t-cross to the cost of simply computing the join and obtaining the exact answer. Our bounds and approximations for the relative cost of sampling show how this cost depends on the size of the input relations, the number of input relations, and the precision criterion used by the estimation procedure. We also demonstrate the deleterious effect of dangling tuples and the mixed effect of data skew on the relative cost of sampling. These results provide insight into when sampling should or should not be used for join selectivity estimation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
DNS92
|
|
 |
HN+93a
|
Peter J. Haas , Jeffrey F. Naughton , S. Seshadri , Arun N. Swami, Fixed-precision estimation of join selectivity, Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.190-201, May 25-28, 1993, Washington, D.C., United States
[doi> 10.1145/153850.153875]
|
| |
HN+93b
|
Haas, P. J., Naughton, J. F., Seshadri, S., and Swami, A. N. (1993). Selectivity and cost estimation for joins based on random sampling. Technical Report RJ 9577. IBM Almaden Research Center. San Jose, CA.
|
 |
HS92a
|
|
 |
HS92b
|
|
 |
HOT88
|
|
 |
HOT89
|
|
 |
HOD91
|
Wen-Chi Hou , Gultekin Ozsoyoglu , Erdogan Dogdu, Error-constrained COUNT query evaluation in relational databases, Proceedings of the 1991 ACM SIGMOD international conference on Management of data, p.278-287, May 29-31, 1991, Denver, Colorado, United States
|
| |
LN89
|
|
 |
LN90
|
|
 |
LNS90
|
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider, Practical selectivity estimation through adaptive sampling, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.1-11, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
OR86
|
|
| |
OR89
|
|
 |
ORX90
|
Frank Olken , Doron Rotem , Ping Xu, Random sampling from hash files, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.375-386, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
Olk93
|
Olken, F. (1993). Random Sampling from Databases. Ph.D. Dissertation. Department of Computer Science. University of California at Berkeley. Berkeley, California.
|
| |
Ses92
|
|
|