ACM Home Page
Please provide us with feedback. Feedback
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
Full text PdfPdf (647 KB)
Source ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 3 ,  Issue 2  (June 2006) table of contents
Pages: 182 - 208  
Year of Publication: 2006
ISSN:1544-3566
Authors
Luis Ceze  University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
Karin Strauss  University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
James Tuck  University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
Josep Torrellas  University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
Jose Renau  University of California, Santa Cruz, Santa Cruz, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 56,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1138035.1138038
What is a DOI?

ABSTRACT

Modern superscalar processors often suffer long stalls because of load misses in on-chip L2 caches. To address this problem, we propose hiding L2 misses with Checkpoint-Assisted VAlue prediction (CAVA). On an L2 cache miss, a predicted value is returned to the processor. When the missing load finally reaches the head of the ROB, the processor checkpoints its state, retires the load, and speculatively uses the predicted value and continues execution. When the value in memory arrives at the L2 cache, it is compared to the predicted value. If the prediction was correct, speculation has succeeded and execution continues; otherwise, execution is rolled back and restarted from the checkpoint. CAVA uses fast checkpointing, speculative buffering, and a modest-sized value prediction structure that has about 50% accuracy. Compared to an aggressive superscalar processor, CAVA speeds up execution by up to 1.45 for SPECint applications and 1.58 for SPECfp applications, with a geometric mean of 1.14 for SPECint and 1.34 for SPECfp applications. We also evaluate an implementation of Runahead execution---a previously proposed scheme that does not perform value prediction and discards all work done between checkpoint and data reception from memory. Runahead execution speeds up execution by a geometric mean of 1.07 for SPECint and 1.18 for SPECfp applications, compared to the same baseline.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
 
5
 
6
7
 
8
9
10
11
 
12
 
13
 
14
15
16
17
 
18
 
19
 
20
 
21
22
 
23
24
25
26
 
27
28


Collaborative Colleagues:
Luis Ceze: colleagues
Karin Strauss: colleagues
James Tuck: colleagues
Josep Torrellas: colleagues
Jose Renau: colleagues