| Dead-block prediction & dead-block correlating prefetchers |
| Full text |
Pdf
(973 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 28th annual international symposium on Computer architecture
table of contents
Göteborg, Sweden
Pages: 144 - 154
Year of Publication: 2001
ISBN:0-7695-1162-7
Also published in ...
|
|
Authors
|
|
An-Chow Lai
|
Electrical & Computer Engineering, Purdue University, West Lafayette, IN
|
|
Cem Fide
|
Sun Microsystems, 901 San Antonio Rd, Palo Alto, CA
|
|
Babak Falsafi
|
Electrical & Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 28, Downloads (12 Months): 107, Citation Count: 37
|
|
|
ABSTRACT
Effective data prefetching requires accurate mechanisms to predict both “which” cache blocks to prefetch and “when” to prefetch them. This paper proposes the Dead-Block Predictors (DBPs), trace-based predictors that accurately identify “when” an Ll data cache block becomes evictable or “dead”. Predicting a dead block significantly enhances prefetching lookahead and opportunity, and enables placing data directly into Ll, obviating the need for auxiliary prefetch buffers. This paper also proposes Dead-Block Correlating Prefetchers (DBCPs), that use address correlation to predict “which” subsequent block to prefetch when a block becomes evictable. A DBCP enables effective data prefetching in a wide spectrum of pointer-intensive, integer, and floating-point applications.
We use cycle-accurate simulation of an out-of-order superscalar processor and memory-intensive benchmarks to show that: (1) dead-block prediction enhances prefetching lookahead at least by an order of magnitude as compared to previous techniques, (2) a DBP can predict dead blocks on average with a coverage of 90% only mispredicting 4% of the time, (3) a DBCP offers an address prediction coverage of 86% only mispredicting 3% of the time, and (4) DBCPs improve performance by 62% on average and 282% at best in the benchmarks we studied.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Jean-Loup Baer and Tien-Fu Chen. Dynamic improvements of locality in virtual memory systems, IEEE Transactions on Software Engineering, March 1976.
|
| |
2
|
|
| |
3
|
Mark J. Charney and Anthony P. Reeves. Generalized correlation-based hardware prefetching. Technical Report EE- CEG-95-1, School of Electrical Engineering, Cornell University, February 1995.
|
 |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
An-Chow Lai , Babak Falsafi, Selective, accurate, and timely self-invalidation using last-touch prediction, Proceedings of the 27th annual international symposium on Computer architecture, p.139-148, June 2000, Vancouver, British Columbia, Canada
|
| |
8
|
Mikko H. Lipasti , William J. Schmidt , Steven R. Kunkel , Robert R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, Proceedings of the 28th annual international symposium on Microarchitecture, p.231-236, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
 |
9
|
|
 |
10
|
|
| |
11
|
Abraham Mendelson, Dominique Thi'ebaut, and Dhiraj Pradhan. Modeling live and dead lines in cache memory systems. Technical Report TR-90-CSE- 14, Department of Electrical and Computer Engineering, University of Massachusetts, 1990.
|
 |
12
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
13
|
Ravi Nair. Dynamic path-based branch prediction. In Proceedings of the 29th Annual IEEE/A CM International Symposium on Microarchitecture (MICRO 29), pages 142-1521, December 1996.
|
| |
14
|
Toshihiro Ozawa , Yasunori Kimura , Shin'ichiro Nishizaki, Cache miss heuristics and preloading techniques for general-purpose programs, Proceedings of the 28th annual international symposium on Microarchitecture, p.243-248, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
 |
15
|
|
 |
16
|
Jih-Kwon Peir , Yongjoon Lee , Windsor W. Hsu, Capturing dynamic memory reference behavior with adaptive cache topology, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.240-250, October 02-07, 1998, San Jose, California, United States
|
 |
17
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
18
|
|
 |
19
|
|
 |
20
|
David A. Wood , Mark D. Hill , R. E. Kessler, A model for estimating trace-sample miss ratios, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.79-89, May 21-24, 1991, San Diego, California, United States
|
CITED BY 38
|
|
Harry Dwyer , John Fernando, Establishing a tight bound on task interference in embedded system instruction caches, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Philo Juang , Kevin Skadron , Margaret Martonosi , Zhigang Hu , Douglas W. Clark , Philip W. Diodato , Stefanos Kaxiras, Implementing branch-predictor decay using quasi-static memory cells, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.2, p.180-219, June 2004
|
|
|
W. Zhang , M. Kandemir , A. Sivasubramaniam , M. J. Irwin, Performance, energy, and reliability tradeoffs in replicating hot cache lines, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
|
|
|
|
|
|
Stephen Somogyi , Thomas F. Wenisch , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Memory coherence activity prediction in commercial workloads, Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, p.37-45, June 20-20, 2004, Munich, Germany
|
|
|
Sorin Iacobovici , Lawrence Spracklen , Sudarshan Kadambi , Yuan Chou , Santosh G. Abraham, Effective stream-based and execution-based data prefetching, Proceedings of the 18th annual international conference on Supercomputing, June 26-July 01, 2004, Malo, France
|
|
|
|
|
|
Chi-Keung Luk , Robert Muth , Harish Patil , Richard Weiss , P. Geoffrey Lowney , Robert Cohn, Profile-guided post-link stride prefetching, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Luis M. Ramos , José Luis Briz , Pablo E. Ibáñez , Victor Viñals, Data prefetching in a cache hierarchy with high bandwidth and capacity, Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures, p.37-44, September 16-20, 2006, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arijit Biswas , Paul Racunas , Razvan Cheveresan , Joel Emer , Shubhendu S. Mukherjee , Ram Rangan, Computing Architectural Vulnerability Factors for Address-Based Structures, ACM SIGARCH Computer Architecture News, v.33 n.2, p.532-543, May 2005
|
|
|
Akihiro Yamamoto , Yusuke Tanaka , Hideki Ando , Toshio Shimada, Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.33-40, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ismail Kadayif , Ayhan Zorlubas , Selcuk Koyuncu , Olcay Kabal , Davut Akcicek , Yucel Sahin , Mahmut Kandemir, Capturing and optimizing the interactions between prefetching and cache line turnoff, Microprocessors & Microsystems, v.32 n.7, p.394-404, October, 2008
|
|
|
Lingxiang Xiang , Tianzhou Chen , Qingsong Shi , Wei Hu, Less reused filter: improving l2 cache performance via filtering less reused lines, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|
|
|
|