ACM Home Page
Please provide us with feedback. Feedback
Exploiting join cardinality for faster hash joins
Full text PdfPdf (344 KB)
Source
Symposium on Applied Computing archive
Proceedings of the 2009 ACM symposium on Applied Computing table of contents
Honolulu, Hawaii
SESSION: Data theory, technology, and applications track table of contents
Pages 1549-1554  
Year of Publication: 2009
ISBN:978-1-60558-166-8
Authors
Michael Henderson  University of British Columbia, Okanagan
Bryce Cutt  University of British Columbia, Okanagan
Ramon Lawrence  University of British Columbia, Okanagan
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 39,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1529282.1529629
What is a DOI?

ABSTRACT

Hash joins combine massive relations in data warehouses, decision support systems, and scientific data stores. Faster hash join performance significantly improves query throughput, response time, and overall system performance. In this work, we demonstrate how using join cardinality improves hash join performance. The key contribution is the development of an algorithm to determine join cardinality in an arbitrary query plan. We implemented early hash join and the join cardinality algorithm in PostgreSQL. Experimental results demonstrate that early hash join has an immediate response time that is an order of magnitude faster than the existing hybrid hash join implementation. One-to-one joins execute up to 50% faster and perform significantly fewer I/Os, and one-to-many joins have similar or better performance over all memory sizes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
TPC-H Benchmark. Technical report, Transaction Processing Performance Council.
 
2
S. Chaudhuri and V. Narasayya. TPC-D data generation with skew. Technical report, Microsoft Research, Available at: ftp.research.microsoft.com/users/viveknar/tpcdskew.
3
 
4
D. DeWitt and J. Naughton. Dynamic Memory Hybrid Hash Join. Technical report, University of Wisconsin, 1995.
 
5
G. Graefe. Five Performance Enhancements for Hybrid Hash Join. Technical Report CU-CS-606-92, University of Colorado at Boulder, 1992.
 
6
 
7
 
8
 
9
 
10
T. Urhan and M. Franklin. XJoin: A Reactively Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin, 23(2): 7--18, 2000.
 
11

Collaborative Colleagues:
Michael Henderson: colleagues
Bryce Cutt: colleagues
Ramon Lawrence: colleagues