|
ABSTRACT
Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU cache prefetching to improve its cache performance. Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing. We present two techniques, group prefetching and software-pipelined prefetching, that overcome these complications. These schemes achieve 1.29--4.04X speedups for the join phase and 1.37--3.49X speedups for the partition phase over GRACE and simple prefetching approaches. Moreover, compared with previous cache-aware approaches (i.e. cache partitioning), the schemes are at least 36% faster on large relations and do not require exclusive use of the CPU cache to be effective. Finally, comparing the elapsed real times when disk I/Os are in the picture, our cache prefetching schemes achieve 1.12--1.84X speedups for the join phase and 1.06--1.60X speedups for the partition phase over the GRACE hash join algorithm.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
Shimin Chen , Phillip B. Gibbons , Todd C. Mowry, Improving index performance through prefetching, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.235-246, May 21-24, 2001, Santa Barbara, California, United States
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
Hepkin, D. 2006. Guide to multiple page size support on AIX 5L version 5.3. http://www-03.ibm.com/servers/aix/whitepapers/multiple_page.pdf.
|
| |
11
|
IBM Corporation. 2004. IBM DB2 Universal Database Administration Guide Version 8.2.
|
| |
12
|
Intel Corporation. 2004. Intel Itanium 2 Processor Reference Manual For Software Development and Optimization. Intel Corporation. Order Number: 251110-003.
|
| |
13
|
Intel Corporation. 2006. IA-32 Intel Architecture Software Developer's Manual (Volumn 3A and 3B): System Programming Guide. Intel Corporation.
|
| |
14
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
| |
15
|
Kalla, R. N., Sinharoy, B., and Tendler, J. M. 2004. IBM Power5 chip: A dual-core multithreaded processor. IEEE Micro 24, 2 (Mar./Apr.), 40--47.
|
| |
16
|
|
| |
17
|
Kitsuregawa, M., Tanaka, H., and Moto-Oka, T. 1983. Application of hash to data base machine and its architecture. New Generation Comput. 1, 1, 63--74.
|
| |
18
|
|
| |
19
|
Lindsay, B. 2002. Hash joins in DB2 UDB: The inside story. Carnegie Mellon DB Seminar.
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
McDougall, R. 2004. Supporting multiple page sizes in the solaris operating system. http://www.sun.com/blueprints/0304/817-5917.pdf.
|
| |
24
|
McFarling, S. 1993. Combining branch predictors. Tech. Rep. WRL Technical Note TN-36, Digital Equipment Corporation (June).
|
| |
25
|
|
 |
26
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
27
|
|
| |
28
|
Perfmon Project. http://www.hpl.hp.com/research/linux/perfmon/index.php4.
|
 |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
Sun Microsystems. UltraSPARC IV Processor Architecture Overview. Sun Microsystems. Technical Whitepaper, Version 1.0, Feb. 2004.
|
| |
33
|
TPC Benchmarks. http://www.tpc.org/.
|
| |
34
|
|
| |
35
|
|
|