ACM Home Page
Please provide us with feedback. Feedback
A disk-based join with probabilistic guarantees
Full text PdfPdf (362 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: estimation and approximation table of contents
Pages: 563 - 574  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Christopher Jermaine  University of Florida, Gainesville
Alin Dobra  University of Florida, Gainesville
Subramanian Arumugam  University of Florida, Gainesville
Shantanu Joshi  University of Florida, Gainesville
Abhijit Pol  University of Florida, Gainesville
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 44,   Citation Count: 3
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066222
What is a DOI?

ABSTRACT

One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm for computing the answer to such a query over large, disk-based input tables. The key innovation of our algorithm is that at all times, it provides an online, statistical estimator for the eventual answer to the query, as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimate's accuracy, or run the algorithm to completion with a total time requirement that is not much longer than other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into core memory.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
W. Cochran: Sampling Techniques. Wiley and Sons, 1977
3
 
4
J.-P. Dittrich, B. Seeger, D. S. Taylor, P. Widmayer: Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm. VLDB 2002: 299--310
5
6
 
7
 
8
 
9
10
11
12
13
 
14
15
 
16
17
 
18
J. Shao: Mathematical Statistics. Springer-Verlag, 1999.

Collaborative Colleagues:
Christopher Jermaine: colleagues
Alin Dobra: colleagues
Subramanian Arumugam: colleagues
Shantanu Joshi: colleagues
Abhijit Pol: colleagues