|
ABSTRACT
Prefetching data ahead of use has the potential to tolerate the grow ing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisti cated prefetching techniques have been automated for limited domains, such as scientific codes that access dense arrays in loop nests, a similar level of success has eluded general-purpose pro grams, especially pointer-chasing codes written in languages such as C and C++. We address this problem by describing, implementing and evaluating a dynamic prefetching scheme. Our technique runs on stock hardware, is completely automatic, and works for general-purpose programs, including pointer-chasing codes written in weakly-typed languages, such as C and C++. It operates in three phases. First, the profiling phase gathers a temporal data reference profile from a running program with low-overhead. Next, the profiling is turned off and a fast analysis algorithm extracts hot data streams, which are data reference sequences that frequently repeat in the same order, from the temporal profile. Then, the system dynamically injects code at appropriate program points to detect and prefetch these hot data streams. Finally, the process enters the hibernation phase where no profiling or analysis is performed, and the program continues to execute with the added prefetch instructions. At the end of the hibernation phase, the program is de-optimized to remove the inserted checks and prefetch instructions, and control returns to the profiling phase. For long-running programs, this profile, analyze and optimize, hibernate, cycle will repeat multiple times. Our initial results from applying dynamic prefetching are promising, indicating overall execution time improvements of 5.19% for several memory-performance-limited SPECint2000 benchmarks running their largest (ref) inputs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Matthew Arnold , Stephen Fink , David Grove , Michael Hind , Peter F. Sweeney, Adaptive optimization in the Jalapeño JVM, Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, p.47-65, October 2000, Minneapolis, Minnesota, United States
|
 |
3
|
|
 |
4
|
Vasanth Bala , Evelyn Duesterwald , Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.1-12, June 18-21, 2000, Vancouver, British Columbia, Canada
|
| |
5
|
|
| |
6
|
M. Charney, and A. Reeves. "Generalized correlation based hardware prefetching." Tech report EE-CEG-95-1, Cornell University, 1995
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
Michał Cierniak , Guei-Yuan Lueh , James M. Stichnoth, Practicing JUDO: Java under dynamic optimizations, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.13-26, June 18-21, 2000, Vancouver, British Columbia, Canada
|
| |
12
|
|
| |
13
|
D. Deaver, R. Gorton, and N. Rubin, "Wiggins/Redstone: An online program specializer.", In Hot Chips, 1999
|
 |
14
|
|
| |
15
|
M. Hirzel and T. Chilimbi. " Bursty Tracing: A Framework for Low-Overhead Temporal Profiling", In Workshop on Feed¿back-Directed and Dynamic Optimizations (FDDO), 2001
|
 |
16
|
|
 |
17
|
|
| |
18
|
M. Karlsson, F. Dahlgren, and P. Stenstrom. "A Prefetching Technique for Irregular Accesses to Linked Data Structures, In High Performance Computer Architectures (HPCA), 1999
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
25
|
M. Paleczny, C. Vick, and C. Click. "The Java HotSpot server compiler.", In USENIX Java Virtual Machine Research and Technology Symposium (JVM), 2001
|
 |
26
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
 |
30
|
|
 |
31
|
|
| |
32
|
A. Srivastava, A. Edwards, and H. Vo. "Vulcan: Binary trans¿formation in a distributed environment.", In Microsoft Re¿search Tech Report, MSR-TR-2001-50, 2001
|
| |
33
|
Artour Stoutchinin , José N. Amaral , Guang R. Gao , James C. Dehnert , Suneel Jain , Alban Douillet, Speculative Prefetching of Induction Pointers, Proceedings of the 10th International Conference on Compiler Construction, p.289-303, April 02-06, 2001
|
| |
34
|
D. Ung, and C. Cifuentes."Opimising hot paths in a dynamic binary translator."In Workshop on Binary Translation, 2000
|
 |
35
|
|
CITED BY 39
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiang Wu , Artem Pyatakov , Alexey Spiridonov , Easwaran Raman , Douglas W. Clark , David I. August, Exposing Memory Access Regularities Using Object-Relative Memory Profiling, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.315, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
|
|
|
Kartik K. Agaram , Stephen W. Keckler , Calvin Lin , Kathryn S. McKinley, Decomposing memory performance: data structures and phases, Proceedings of the 2006 international symposium on Memory management, June 10-11, 2006, Ottawa, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shashidhar Mysore , Banit Agrawal , Rodolfo Neuber , Timothy Sherwood , Nisheeth Shrivastava , Subhash Suri, Formulating and implementing profiling over adaptive ranges, ACM Transactions on Architecture and Code Optimization (TACO), v.5 n.1, p.1-32, May 2008
|
|
|
|
|
|
Jiwei Lu , Abhinav Das , Wei-Chung Hsu , Khoa Nguyen , Santosh G. Abraham, Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
Thomas F. Wenisch , Stephen Somogyi , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Temporal Streaming of Shared Memory, ACM SIGARCH Computer Architecture News, v.33 n.2, p.222-233, May 2005
|
|
|
|
|
|
Jiwei Lu , Howard Chen , Rao Fu , Wei-Chung Hsu , Bobbie Othmer , Pen-Chung Yew , Dong-Yuan Chen, The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.180, December 03-05, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|