| The Sort-Merge-Shrink join |
| Full text |
Pdf
(501 KB)
|
| Source
|
ACM Transactions on Database Systems (TODS)
archive
Volume 31 , Issue 4 (December 2006)
table of contents
Pages: 1382 - 1416
Year of Publication: 2006
ISSN:0362-5915
|
|
Authors
|
|
Christopher Jermaine
|
University of Florida, Gainesville, FL
|
|
Alin Dobra
|
University of Florida, Gainesville, FL
|
|
Subramanian Arumugam
|
University of Florida, Gainesville, FL
|
|
Shantanu Joshi
|
University of Florida, Gainesville, FL
|
|
Abhijit Pol
|
University of Florida, Gainesville, FL
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 95, Citation Count: 0
|
|
|
ABSTRACT
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over large, disk-based input tables. The key innovation of the SMS join is that if the input data are clustered in a statistically random fashion on disk, then at all times, the join provides an online, statistical estimator for the eventual answer to the query as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimate's accuracy or run the algorithm to completion with a total time requirement that is not much longer than that of other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into main memory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, Join synopses for approximate query answering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.275-286, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
2
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303978]
|
| |
3
|
Alon, N., Gibbons, P. B., Matias, Y., and Szegedy, M. 2002. Tracking join and self-join sizes in limited storage. J. Comput. Syst. Sci. 64, 3, 719--747.
|
 |
4
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, On random sampling over joins, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.263-274, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
5
|
Cochran, W. 1977. Sampling Techniques. John Wiley and Sons.
|
 |
6
|
|
| |
7
|
Dittrich, J-P., Seeger, B., Taylor, D. S., and Widmayer, P. 2002. Progressive merge join: A generic and non-blocking sort-based join Algorithm. VLDB Conference 299--310.
|
 |
8
|
Jens-Peter Dittrich , Bernhard Seeger , David Scot Taylor , Peter Widmayer, On producing join results early, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.134-142, June 09-11, 2003, San Diego, California
[doi> 10.1145/773153.773167]
|
 |
9
|
|
 |
10
|
|
 |
11
|
Sumit Ganguly , Phillip B. Gibbons , Yossi Matias , Avi Silberschatz, Bifocal sampling for skew-resistant join size estimation, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.271-281, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
Joseph M. Hellerstein , Ron Avnur , Andy Chou , Christian Hidber , Chris Olston , Vijayshankar Raman , Tali Roth , Peter J. Haas, Interactive Data Analysis: The Control Project, Computer, v.32 n.8, p.51-59, August 1999
[doi> 10.1109/2.781635]
|
 |
16
|
Joseph M. Hellerstein , Peter J. Haas , Helen J. Wang, Online aggregation, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.171-182, May 11-15, 1997, Tucson, Arizona, United States
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
Zachary G. Ives , Daniela Florescu , Marc Friedman , Alon Levy , Daniel S. Weld, An adaptive query execution system for data integration, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.299-310, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
21
|
Christopher Jermaine , Alin Dobra , Subramanian Arumugam , Shantanu Joshi , Abhijit Pol, A disk-based join with probabilistic guarantees, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
[doi> 10.1145/1066157.1066222]
|
| |
22
|
|
 |
23
|
|
 |
24
|
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider, Practical selectivity estimation through adaptive sampling, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.1-11, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
Olken, F. 1993. Raridom sampling from databases. PhD thesis, University of California, Berkeley, CA.
|
| |
30
|
Shao, J. 1999. Mathematical Statistics, Springer-Verlag.
|
 |
31
|
|
| |
32
|
Urhan, T. and Franklin, M. J. 2000. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23, 2, 27--33.
|
|