ACM Home Page
Please provide us with feedback. Feedback
The Sort-Merge-Shrink join
Full text PdfPdf (501 KB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 31 ,  Issue 4  (December 2006) table of contents
Pages: 1382 - 1416  
Year of Publication: 2006
ISSN:0362-5915
Authors
Christopher Jermaine  University of Florida, Gainesville, FL
Alin Dobra  University of Florida, Gainesville, FL
Subramanian Arumugam  University of Florida, Gainesville, FL
Shantanu Joshi  University of Florida, Gainesville, FL
Abhijit Pol  University of Florida, Gainesville, FL
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 95,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1189769.1189775
What is a DOI?

ABSTRACT

One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over large, disk-based input tables. The key innovation of the SMS join is that if the input data are clustered in a statistically random fashion on disk, then at all times, the join provides an online, statistical estimator for the eventual answer to the query as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimate's accuracy or run the algorithm to completion with a total time requirement that is not much longer than that of other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into main memory.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Alon, N., Gibbons, P. B., Matias, Y., and Szegedy, M. 2002. Tracking join and self-join sizes in limited storage. J. Comput. Syst. Sci. 64, 3, 719--747.
4
 
5
Cochran, W. 1977. Sampling Techniques. John Wiley and Sons.
6
 
7
Dittrich, J-P., Seeger, B., Taylor, D. S., and Widmayer, P. 2002. Progressive merge join: A generic and non-blocking sort-based join Algorithm. VLDB Conference 299--310.
8
9
10
11
 
12
 
13
14
 
15
16
17
18
19
20
21
 
22
23
24
 
25
26
27
 
28
 
29
Olken, F. 1993. Raridom sampling from databases. PhD thesis, University of California, Berkeley, CA.
 
30
Shao, J. 1999. Mathematical Statistics, Springer-Verlag.
31
 
32
Urhan, T. and Franklin, M. J. 2000. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23, 2, 27--33.

Collaborative Colleagues:
Christopher Jermaine: colleagues
Alin Dobra: colleagues
Subramanian Arumugam: colleagues
Shantanu Joshi: colleagues
Abhijit Pol: colleagues