|
ABSTRACT
Data prefetching effectively reduces the negative effects of long load latencies on the performance of modern processors. Hardware prefetchers employ hardware structures to predict future memory addresses based on previous patterns. Thread-based prefetchers use portions of the actual program code to determine future load addresses for prefetching.This paper proposes the use of a pointer cache, which tracks pointer transitions, to aid prefetching. The pointer cache provides, for a given pointer's effective address, the base address of the object pointed to by the pointer. We examine using the pointer cache in a wide issue superscalar processor as a value predictor and to aid prefetching when a chain of pointers is being traversed. When a load misses in the L1 cache, but hits in the pointer cache, the first two cache blocks of the pointed to object are prefetched. In addition, the load's dependencies are broken by using the pointer cache hit as a value prediction.We also examine using the pointer cache to allow speculative precomputation to run farther ahead of the main thread of execution than in prior studies. Previously proposed thread-based prefetchers are limited in how far they can run ahead of the main thread when traversing a chain of recurrent dependent loads. When combined with the pointer cache, a speculative thread can make better progress ahead of the main thread, rapidly traversing data structures in the face of cache misses caused by pointer transitions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
 |
7
|
|
 |
8
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, Memory-system design considerations for dynamically-scheduled processors, Proceedings of the 24th annual international symposium on Computer architecture, p.133-143, June 01-04, 1997, Denver, Colorado, United States
|
 |
9
|
|
 |
10
|
|
| |
11
|
M. Karlsson, P. Dahgren, and P. Stenstrom. A prefetching technique for irregular accesses to linked data structures. In Proceedings of the Sixth International Symposium on High Performance Computer Architecture, January 2000.
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
16
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
17
|
|
| |
18
|
|
| |
19
|
A. Roth, C. B. Zilles, and G. S. Sohi. Micro-architectural miss/execute decoupling. In International Workshop on Memory access Decoupled Architectures and Related Issues, October 2000.
|
 |
20
|
André Seznec , Stephen Felix , Venkata Krishnan , Yiannakis Sazeides, Design tradeoffs for the Alpha EV8 conditional branch predictor, Proceedings of the 29th annual international symposium on Computer architecture, p.295, May 25-29, 2002, Anchorage, Alaska
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
Y. Song and M. Dubois. Assisted execution. Technical Report CENG 98-25, University of Southern California, October 1988.
|
| |
25
|
D.M. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In 22nd Annual Computer Measurement Group Conference, December 1996.
|
 |
26
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
27
|
|
 |
28
|
|
CITED BY 16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Luis M. Ramos , José Luis Briz , Pablo E. Ibáñez , Victor Viñals, Data prefetching in a cache hierarchy with high bandwidth and capacity, Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures, p.37-44, September 16-20, 2006, Seattle, Washington
|
|
|
Zhen Yang , Xudong Shi , Feiqi Su , Jih-Kwon Peir, Overlapping dependent loads with addressless preload, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
Jiwei Lu , Howard Chen , Rao Fu , Wei-Chung Hsu , Bobbie Othmer , Pen-Chung Yew , Dong-Yuan Chen, The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.180, December 03-05, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|