| CAVA: Using checkpoint-assisted value prediction to hide L2 misses |
| Full text |
Pdf
(647 KB)
|
| Source
|
ACM Transactions on Architecture and Code Optimization (TACO)
archive
Volume 3 , Issue 2 (June 2006)
table of contents
Pages: 182 - 208
Year of Publication: 2006
ISSN:1544-3566
|
|
Authors
|
|
Luis Ceze
|
University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
|
|
Karin Strauss
|
University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
|
|
James Tuck
|
University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
|
|
Josep Torrellas
|
University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
|
|
Jose Renau
|
University of California, Santa Cruz, Santa Cruz, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 56, Citation Count: 6
|
|
|
ABSTRACT
Modern superscalar processors often suffer long stalls because of load misses in on-chip L2 caches. To address this problem, we propose hiding L2 misses with Checkpoint-Assisted VAlue prediction (CAVA). On an L2 cache miss, a predicted value is returned to the processor. When the missing load finally reaches the head of the ROB, the processor checkpoints its state, retires the load, and speculatively uses the predicted value and continues execution. When the value in memory arrives at the L2 cache, it is compared to the predicted value. If the prediction was correct, speculation has succeeded and execution continues; otherwise, execution is rolled back and restarted from the checkpoint. CAVA uses fast checkpointing, speculative buffering, and a modest-sized value prediction structure that has about 50% accuracy. Compared to an aggressive superscalar processor, CAVA speeds up execution by up to 1.45 for SPECint applications and 1.58 for SPECfp applications, with a geometric mean of 1.14 for SPECint and 1.34 for SPECfp applications. We also evaluate an implementation of Runahead execution---a previously proposed scheme that does not perform value prediction and discards all work done between checkpoint and data reception from memory. Runahead execution speeds up execution by a geometric mean of 1.07 for SPECint and 1.18 for SPECfp applications, compared to the same baseline.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, p.59, May 25-29, 2002, Anchorage, Alaska
|
 |
16
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
17
|
|
| |
18
|
Milo M. K. Martin , Daniel J. Sorin , Harold W. Cain , Mark D. Hill , Mikko H. Lipasti, Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
19
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
Srikanth T. Srinivasan , Ravi Rajwar , Haitham Akkary , Amit Gandhi , Mike Upton, Continual flow pipelines, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
26
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
27
|
|
 |
28
|
|
|