|
ABSTRACT
Pointer-chasing applications tend to traverse composite data structures consisting of multiple independent pointer chains. While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism. This article investigates exploiting such interchain memory parallelism for the purpose of memory latency tolerance, using a technique called multi--chain prefetching. Previous works [Roth et al. 1998;Roth and Sohi 1999] have proposed prefetching simple pointer-based structures in a multi--chain fashion. However, our work enables multi--chain prefetching for arbitrary data structures composed of lists, trees, and arrays.This article makes five contributions in the context of multi--chain prefetching. First, we introduce a framework for compactly describing linked data structure (LDS) traversals, providing the data layout and traversal code work information necessary for prefetching. Second, we present an off-line scheduling algorithm for computing a prefetch schedule from the LDS descriptors that overlaps serialized cache misses across separate pointer-chain traversals. Our analysis focuses on static traversals. We also propose using speculation to identify independent pointer chains in dynamic traversals. Third, we propose a hardware prefetch engine that traverses pointer-based data structures and overlaps multiple pointer chains according to the computed prefetch schedule. Fourth, we present a compiler that extracts LDS descriptors via static analysis of the application source code, thus automating multi--chain prefetching. Finally, we conduct an experimental evaluation of compiler-instrumented multi--chain prefetching and compare it against jump pointer prefetching [Luk and Mowry 1996], prefetch arrays [Karlsson et al. 2000], and predictor-directed stream buffers (PSB) [Sherwood et al. 2000].Our results show compiler-instrumented multi--chain prefetching improves execution time by 40% across six pointer-chasing kernels from the Olden benchmark suite [Rogers et al. 1995], and by 3% across four SPECint2000 benchmarks. Compared to jump pointer prefetching and prefetch arrays, multi--chain prefetching achieves 34% and 11% higher performance for the selected Olden and SPECint2000 benchmarks, respectively. Compared to PSB, multi--chain prefetching achieves 27% higher performance for the selected Olden benchmarks, but PSB outperforms multi--chain prefetching by 0.2% for the selected SPECint2000 benchmarks. An ideal PSB with an infinite Markov predictor achieves comparable performance to multi--chain prefetching, coming within 6% across all benchmarks. Finally, speculation can enable multi--chain prefetching for some dynamic traversal codes, but our technique loses its effectiveness when the pointer-chain traversal order is highly dynamic.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. Tech. rep. CS TR 1342, University of Wisconsin-Madison, Madison, WI.
|
 |
3
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
| |
4
|
Charney, M. J. and Reeves, A. P. 1995. Generalized correlation based hardware prefetching. Tech. rep. EE CEG 95--100.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
 |
10
|
John W. C. Fu , Janak H. Patel , Bob L. Janssens, Stride directed prefetching in scalar processors, Proceedings of the 25th annual international symposium on Microarchitecture, p.102-110, December 01-04, 1992, Portland, Oregon, United States
|
 |
11
|
Susan L. Graham , Peter B. Kessler , Marshall K. Mckusick, Gprof: A call graph execution profiler, Proceedings of the 1982 SIGPLAN symposium on Compiler construction, p.120-126, June 23-25, 1982, Boston, Massachusetts, United States
|
| |
12
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
 |
13
|
|
 |
14
|
|
| |
15
|
Karlsson, M., Dahlgren, F., and Stenstrom, P. 2000. A prefetching technique for irregular accesses to linked data structures. In Proceedings of the 6th International Conference on High Performance Computer Architecture (Toulouse, France). ACM Press, New York, NY.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
 |
21
|
|
 |
22
|
|
| |
23
|
Lyle, J. R. and Wallace, D. R. 1997. Using the unravel program slicing tool to evaluate high integrity software. In Proceedings of 10th International Software Quality Week.
|
 |
24
|
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
|
 |
29
|
|
 |
30
|
|
 |
31
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
32
|
|
| |
33
|
|
 |
34
|
|
| |
35
|
Artour Stoutchinin , José N. Amaral , Guang R. Gao , James C. Dehnert , Suneel Jain , Alban Douillet, Speculative Prefetching of Induction Pointers, Proceedings of the 10th International Conference on Compiler Construction, p.289-303, April 02-06, 2001
|
| |
36
|
|
 |
37
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
38
|
|
 |
39
|
|
|