ACM Home Page
Please provide us with feedback. Feedback
Beating in-order stalls with "flea-flicker" two-pass pipelining
Full text PdfPdf (247 KB)
Source International Symposium on Microarchitecture archive
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture table of contents
Page: 387  
Year of Publication: 2003
ISBN:0-7695-2043-X
Authors
Ronald D. Barnes  Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Erik M. Nystrom  Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
John W. Sias  Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Sanjay J. Patel  Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Nacho Navarro  Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Wen-mei W. Hwu  Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Sponsor
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 19,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Accommodating the uncertain latency of load instructionsis one of the most vexing problems in in-order microarchitecturedesign and compiler development. Compilers cangenerate schedules with a high degree of instruction-levelparallelism but cannot effectively accommodate unanticipatedlatencies; incorporating traditional out-of-order executioninto the microarchitecture hides some of this latencybut redundantly performs work done by the compiler andadds additional pipeline stages. Although effective techniques,such as prefetching and threading, have been proposedto deal with anticipable, long-latency misses, theshorter, more diffuse stalls due to difficult-to-anticipate,first- or second-level misses are less easily hidden on in-orderarchitectures. This paper addresses this problemby proposing a microarchitectural technique, referred toas two-pass pipelining, wherein the program executes ontwo in-order back-end pipelines coupled by a queue. The"advance" pipeline executes instructions greedily, withoutstalling on unanticipated latency dependences (executingindependent instructions while otherwise blocking instructionsare deferred). The "backup" pipeline allows concurrentresolution of instructions that were deferred in theother pipeline, resulting in the absorption of shorter missesand the overlap of longer ones. This paper argues that thisdesign is both achievable and a good use of transistor resourcesand shows results indicating that it can deliver significantspeedups for in-order processor designs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
[4] Intel Corporation, Intel Itanium 2 Processor Reference Manual for Software Development and Optimization, Apr. 2003.
5
 
6
7
8
 
9
10
 
11
[11] E. S. Fetzer, M. Gibson, A. Klein, N. Calick, C. Zhu, E. Busta, and B. Mohammad, "A fully bypassed six-issue integer datapath and register file on the itanium-2 microprocessor," IEEE Journal of Solid-State Circuits, vol. 37, Nov. 2002.
 
12
13
 
14
15
16
17
 
18
19

CITED BY  11

Collaborative Colleagues:
Ronald D. Barnes: colleagues
Erik M. Nystrom: colleagues
John W. Sias: colleagues
Sanjay J. Patel: colleagues
Nacho Navarro: colleagues
Wen-mei W. Hwu: colleagues