|
ABSTRACT
While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former applications benefit from long cache lines, the latter prefer short lines. One good solution is to combine short lines with prefetching. In this way, each application can exploit the amount of spatial locality that it has. However, prefetching, if provided, should also work for the irregular codes. This paper presents a new prefetching scheme that, while usable by regular applications, is specifically targeted to irregular ones: Memory Binding and Group Prefetching.The idea is to hardware-bind and prefetch together groups of data that the programmer suggests are strongly related to each other. Examples are the different fields in a record or two records linked by a permanent pointer. This prefetching scheme, combined with short cache lines, results in a memory hierarchy design that can be exploited by both regular and irregular applications. Overall, it is better to use a system with short lines (16-32 bytes) and our prefetching than a system with long lines (128 bytes) with or without our prefetching. The former system runs 6 out of 7 Splash-class applications faster. In particular, some of the most irregular applications run 25-40% faster.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
 |
3
|
Rohit Chandra , Kourosh Gharachorloo , Vijayaraghavan Soundararajan , Anoop Gupta, Performance evaluation of hybrid hardware and software distributed shared memory protocols, Proceedings of the 8th international conference on Supercomputing, p.274-288, July 11-15, 1994, Manchester, England
[doi> 10.1145/181181.181543]
|
 |
4
|
|
 |
5
|
William Y. Chen , Scott A. Mahlke , Pohua P. Chang , Wen-mei W. Hwu, Data access microarchitectures for superscalar processors with compiler-assisted data prefetching, Proceedings of the 24th annual international symposium on Microarchitecture, p.69-73, September 1991, Albuquerque, New Mexico, Puerto Rico
[doi> 10.1145/123465.123478]
|
| |
6
|
F. Dahlgren, M. Dubois, and P. Stenstrom. Fixed and Adaptive Sequential Prefetching in Sha.red-Memory Multiprocessors. In Proceedings of the 1993 International Conference on Parallel Processing~ pages I:56-63, August 1993.
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
 |
14
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
Steven Cameron Woo , Jaswinder Pal Singh , John L. Hennessy, The performance advantages of integrating block data transfer in cache-coherent multiprocessors, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.219-229, October 05-07, 1994, San Jose, California, United States
|
CITED BY 20
|
|
|
|
|
|
|
|
|
|
|
|
|
Tong Chen , Tao Zhang , Zehra Sura , Mar Gonzales Tallada, Prefetching irregular references for software cache on cell, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|