|
ABSTRACT
The expanding gap between microprocessor and DRAM performance has
necessitated the use of increasingly aggressive techniques designed to reduce or hide the latency of main memory access. Although large cache hierarchies have proven to be effective in reducing this latency for the most frequently used data, it is still not uncommon for many programs to spend more than half their run times stalled on memory requests. Data prefetching has been proposed as a technique for hiding the access latency of data referencing patterns that defeat caching strategies. Rather than waiting for a cache miss to initiate a memory fetch, data prefetching anticipates such misses and issues a fetch to the memory system in advance of the actual memory reference. To be effective, prefetching must be implemented in such a way that prefetches are timely, useful, and introduce little overhead. Secondary effects such as cache pollution and increased memory bandwidth requirements must also be taken into consideration. Despite these obstacles, prefetching has the potential to significantly improve overall program execution time by overlapping computation with memory accesses. Prefetching strategies are diverse, and no single strategy has yet been proposed that provides optimal performance. The following survey examines several alternative approaches, and discusses the design tradeoffs involved when implementing a data prefetch strategy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
ANACKER,W.AND WANG, C. P. 1967. Performance evaluation of computing systems with memory hierarchies. IEEE Trans. Comput. 16,6, 764-773.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
CASMIRA,J.P.AND KAELI, D. R. 1995. Modeling cache pollution. In Proceedings of the Second IASTED Conference on Modeling and Simulation. 123-126.
|
| |
8
|
CHAN, K. K. 1996. Design of the HP PA 7200 CPU. Hewlett-Packard J. 47, 1, 25-33.
|
| |
9
|
CHANG,P.Y.,KAELI, D., AND LIU, Y. 1994. Branchdirected data cache prefetching. In Proceedings of Second Workshop on Shared-Memory Multiprocessing Systems.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
William Y. Chen , Scott A. Mahlke , Pohua P. Chang , Wen-mei W. Hwu, Data access microarchitectures for superscalar processors with compiler-assisted data prefetching, Proceedings of the 24th annual international symposium on Microarchitecture, p.69-73, September 1991, Albuquerque, New Mexico, Puerto Rico
[doi> 10.1145/123465.123478]
|
| |
14
|
|
| |
15
|
DAHLGREN, F., DUBOIS, M., AND STENSTROM,P. 1993. Fixed and adaptive sequential prefetching in shared-memory multiprocessors. In Proceedings of the International Conference on Parallel Processing (St. Charles, IL). 56-63.
|
| |
16
|
FU, B., SAINI, A., AND GELSINGER, P. P. 1989. Performance and microarchitecture of the i486 processor. In Proceedings of the IEEE International Conference on Computer Design (Cambridge, MA). 182-187.
|
 |
17
|
|
 |
18
|
John W. C. Fu , Janak H. Patel , Bob L. Janssens, Stride directed prefetching in scalar processors, Proceedings of the 25th annual international symposium on Microarchitecture, p.102-110, December 01-04, 1992, Portland, Oregon, United States
|
| |
19
|
GORNISH,E.H.AND VEIDENBAUM, A. V. 1994. An integrated hardware/software scheme for shared-memory multiprocessors. In Proceedings of International Conference on Parallel Processing (St. Charles, IL). 281-284.
|
 |
20
|
|
| |
21
|
HARRISON,L.AND MEHROTRA, S. 1994. A data prefetch mechanism for accelerating general computation. 1351. University of Illinois at Urbana-Champaign, Champaign, IL.
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
Mikko H. Lipasti , William J. Schmidt , Steven R. Kunkel , Robert R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, Proceedings of the 28th annual international symposium on Microarchitecture, p.231-236, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
 |
31
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
32
|
OBERLIN, S., KESSLER, R., SCOTT, S., AND THORSON, G. 1996. Cray T3E architecture overview. Cray Supercomputers, Chippewa Falls, MN.
|
 |
33
|
|
| |
34
|
|
| |
35
|
|
 |
36
|
|
 |
37
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
38
|
Vatsa Santhanam , Edward H. Gornish , Wei-Chung Hsu, Data prefetching on the HP PA-8000, Proceedings of the 24th annual international symposium on Computer architecture, p.264-273, June 01-04, 1997, Denver, Colorado, United States
|
 |
39
|
|
| |
40
|
SMITH, A. J. 1978. Sequential program prefetching in memory hierarchies. IEEE Computer 11, 12, 7-21.
|
 |
41
|
|
| |
42
|
|
| |
43
|
|
| |
44
|
|
| |
45
|
|
| |
46
|
YOUNG,H.C.AND SHEKITA, E. J. 1993. An intelligent I-cache prefetch mechanism. In Proceedings of the International Conference on Computer Design (ICCD'93, Cambridge, MA). IEEE Computer Society Press, Los Alamitos, CA, 44-49.
|
 |
47
|
|
CITED BY 39
|
|
|
|
|
Françoise Fabret , H. Arno Jacobsen , François Llirbat , Joăo Pereira , Kenneth A. Ross , Dennis Shasha, Filtering algorithms and implementation for very fast publish/subscribe systems, ACM SIGMOD Record, v.30 n.2, p.115-126, June 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aviral Shrivastava , Eugene Earlie , Nikil Dutt , Alex Nicolau, Aggregating processor free time for energy reduction, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, September 19-21, 2005, Jersey City, NJ, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Luis M. Ramos , José Luis Briz , Pablo E. Ibáñez , Victor Viñals, Data prefetching in a cache hierarchy with high bandwidth and capacity, Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures, p.37-44, September 16-20, 2006, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zhen Yang , Xudong Shi , Feiqi Su , Jih-Kwon Peir, Overlapping dependent loads with addressless preload, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
Tong Chen , Tao Zhang , Zehra Sura , Mar Gonzales Tallada, Prefetching irregular references for software cache on cell, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ismail Kadayif , Ayhan Zorlubas , Selcuk Koyuncu , Olcay Kabal , Davut Akcicek , Yucel Sahin , Mahmut Kandemir, Capturing and optimizing the interactions between prefetching and cache line turnoff, Microprocessors & Microsystems, v.32 n.7, p.394-404, October, 2008
|
|
|
|
|
|
Jacob Leverich , Hideho Arakida , Alex Solomatnikov , Amin Firoozshahian , Mark Horowitz , Christos Kozyrakis, Comparative evaluation of memory models for chip multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.5 n.3, p.1-30, November 2008
|
|
|
Seung Woo Son , Mahmut Kandemir , Mustafa Karakoy , Dhruva Chakrabarti, A compiler-directed data prefetching scheme for chip multiprocessors, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
|
REVIEW
"Andrea Prati : Reviewer"
The growing gap between microprocessor and memory
performance has elected cache prefetching as a key research
issue in recent years. Vanderwiel and Lilja present in this
paper a survey on data prefetch mechanisms both for single
processor an
more...
|