ACM Home Page
Please provide us with feedback. Feedback
Scalable selective re-execution for EDGE architectures
Full text PdfPdf (214 KB)
Source Architectural Support for Programming Languages and Operating Systems archive
Proceedings of the 11th international conference on Architectural support for programming languages and operating systems table of contents
Boston, MA, USA
SESSION: Architecture table of contents
Pages: 120 - 132  
Year of Publication: 2004
ISBN:1-58113-804-0
Also published in ...
Authors
Rajagopalan Desikan  The University of Texas at Austin
Simha Sethumadhavan  The University of Texas at Austin
Doug Burger  The University of Texas at Austin
Stephen W. Keckler  The University of Texas at Austin
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
SIGOPS: ACM Special Interest Group on Operating Systems
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 47,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1024393.1024408
What is a DOI?

ABSTRACT

Pipeline flushes are becoming increasingly expensive in modern microprocessors with large instruction windows and deep pipelines. Selective re-execution is a technique that can reduce the penalty of mis-speculations by re-executing only instructions affected by the mis-speculation, instead of all instructions. In this paper we introduce a new selective re-execution mechanism that exploits the properties of a dataflow-like Explicit Data Graph Execution (EDGE) architecture to support efficient mis-speculation recovery, while scaling to window sizes of thousands of instructions with high performance. This distributed selective re-execution (DSRE) protocol permits multiple speculative waves of computation to be traversing a dataflow graph simultaneously, with a commit wave propagating behind them to ensure correct execution. We evaluate one application of this protocol to provide efficient recovery for load-store dependence speculation. Unlike traditional dataflow architectures which resorted to single-assignment memory semantics, the DSRE protocol combines dataflow execution with speculation to enable high performance and conventional sequential memory semantics. Our experiments show that the DSRE protocol results in an average 17% speedup over the best dependence predictor proposed to date, and obtains 82% of the performance possible with a perfect oracle directing the issue of loads.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
B. Calder and G. Reinman. A comparative survey of load speculation architectures. Journal of Instruction-Level Parallelism, 2, May 2000.
 
5
6
7
 
8
D. Ernst and T. Austin. Practical Selective Replay for Reduced-Tag Schedulers. In Proceedings of the 2nd Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD-2), pages 58--63, June 2003.
 
9
10
 
11
 
12
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal Q1, 2001.
13
 
14
J. B. Keller, R. W. Haddad, and S. G. Meier. Scheduler which discovers non-speculative nature of an instruction after issuing and reissues the instruction. United States Patent 6,564,315, May 2003.
 
15
 
16
 
17
 
18
A. A. Merchant, D. J. Sager, and D. D. Boggs. Computer processor with a replay system. United States Patent 6,163,838, December 2000.
 
19
A. A. Merchant, D. J. Sager, D. D. Boggs, and M. D. Upton. Computer processor with a replay system having a plurality of checkers. United States Patent 6,094,717, July 2000.
20
 
21
 
22
R. Panwar and R. C. Hetherington. Appartus for executing coded dependent instructions having variable latencies. United States Patent 5,987,594, November 1999.
23
 
24
N. Ranganathan, R. Nagarajan, D. Burger, and S. W. Keckler. Combining hyperblocks and exit prediction to increase front-end bandwidth and performance. Technical Report TR-02-41, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, September 2002.
 
25
 
26
27
 
28
 
29
 
30
Trimaran: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org.
31
 
32
H. Zhou, C. ying Fu, E. Rotenberg, and T. Conte. A study of value speculative execution and misspeculation recovery in superscalar microprocessors. Technical report, ECE Department, N. C. State University, January 2000.



REVIEW

"Arvid G. Larson : Reviewer"

Pipeline architectures with expansive microarchitectural structures, using increasingly large instruction windows and look-ahead depths, are often limited in throughput efficiency. This limitation is primarily due to frequent misspeculation of ins  more...

Collaborative Colleagues:
Rajagopalan Desikan: colleagues
Simha Sethumadhavan: colleagues
Doug Burger: colleagues
Stephen W. Keckler: colleagues