ACM Home Page
Please provide us with feedback. Feedback
Dual path instruction processing
Full text PdfPdf (332 KB)
Source International Conference on Supercomputing archive
Proceedings of the 16th international conference on Supercomputing table of contents
New York, New York, USA
SESSION: Architecture 2 table of contents
Pages: 220 - 229  
Year of Publication: 2002
ISBN:1-58113-483-5
Authors
Juan L. Aragón  Universidad de Murcia, Murcia (Spain)
José González  Universidad de Murcia, Murcia (Spain)
Antonio González  Universitat Politècnica de Catalunya, Barcelona (Spain)
James E. Smith  University of Wisconsin-Madison, Madison, WI
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 30,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/514191.514223
What is a DOI?

ABSTRACT

The reasons for performance losses due to conditional branch mispredictions are first studied. Branch misprediction penalties are broken into three categories: pipeline-fill penalty, window-fill penalty, and serialization penalty. The first and third of these produce most of the performance loss, but the second is also significant. Previously proposed dual (or multi) path execution methods attempt to reduce all three penalties, but these methods are also quite complex. Most of the complexity is caused by simultaneously executing instructions from multiple paths.A good engineering compromise is to avoid the complexity of multiple path execution by focusing on methods that reduce only the pipeline and window re-fill penalties. Dual Path Instruction Processing (DPIP) is proposed as a simple mechanism that fetches, decodes, and renames, but does not execute, instructions from the alternative path for low confidence predicted branches at the same time as the predicted path is being executed. All the stages of the pipeline front-end are hidden once the misprediction is detected. This method thus targets the pipeline-fill penalty and is shown to achieve a good trade-off between performance and complexity. To reduce the window-fill penalty, we further propose the addition of a pre-scheduling engine that schedules instructions from the alternative path in an estimated execution order. Thus, after a misprediction, a high number of instructions from the alternate path can be immediately issued to execution, achieving an effect similar to very fast re-filling of the window. Performance evaluation of DPIP in a 14-stage superscalar processor (like IBM Power 4) shows an average IPC improvement of up to 10% for the bzip2 benchmark, and an average of 8% for ten benchmarks from the SPECint95 and SPECint2000 suites.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
D. Burger and T.M. Austin. "The SimpleScalar Tool Set, Version 2.0". Technical Report #1342, University of Wisconsing-Madison, Computer Sciences Department, 1997
4
 
5
 
6
W.D. Connors, J. Florkowski and S.K. Patton. "The IBM 3033: An Inside Look". Datamation, pages 198-218, May 1979
 
7
J. Cortadella and J. M. Llaberia. "An Intelligent IFU for Pipelined Processors that Make Control Instrunctions Transparent to the Execution Unit". Proc. of Int. Symp. on Applied Informatics, pp.188-191, Feb. 1987
 
8
P.N. Glaskowsky. "Pentium 4 (Partially) Previewed". Microprocessor Report, August 2000
 
9
A. González, J.M. Llabería and J. Cortadella. "A Mechanism for Reducing the Cost of Branches in RISC Architectures". Microprocessing and Microprogramming, vol. 24,1-5, pp. 565-572, Aug. 1988
 
10
11
 
12
L. Gwennap. "MIPS R10000 Uses Decoupled Architecture". Microprocessor Report, pp.18-22, Oct. 1994
 
13
T.H. Heil and J.E. Smith. "Selective Dual Path Execution". Technical Report, University of Wisconsin-Madison, ECE, 1997
 
14
15
 
16
17
 
18
 
19
K. Krewell. "IBM's Power4 Unveiling Continues". Microprocessor Report, November 2000
 
20
C.C. Lee, I.C.K. Chen and T.N. Mudge. "The Bi-Mode Branch Predictor". Proc. of the Int. Symp. on Microarchitecture, 1996
 
21
 
22
S. McFarling. "Combining Branch Predictors". Tech. Report TN-36. Digital Western Research Lab., 1993
 
23
 
24
25
 
26
27
28


Collaborative Colleagues:
Juan L. Aragón: colleagues
José González: colleagues
Antonio González: colleagues
James E. Smith: colleagues