| Beating in-order stalls with "flea-flicker" two-pass pipelining |
| Full text |
Pdf
(247 KB)
|
| Source
|
International Symposium on Microarchitecture
archive
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
table of contents
Page: 387
Year of Publication: 2003
ISBN:0-7695-2043-X
|
|
Authors
|
|
Ronald D. Barnes
|
Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
|
|
Erik M. Nystrom
|
Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
|
|
John W. Sias
|
Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
|
|
Sanjay J. Patel
|
Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
|
|
Nacho Navarro
|
Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
|
|
Wen-mei W. Hwu
|
Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
|
|
| Sponsor |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 19, Citation Count: 10
|
|
|
ABSTRACT
Accommodating the uncertain latency of load instructionsis one of the most vexing problems in in-order microarchitecturedesign and compiler development. Compilers cangenerate schedules with a high degree of instruction-levelparallelism but cannot effectively accommodate unanticipatedlatencies; incorporating traditional out-of-order executioninto the microarchitecture hides some of this latencybut redundantly performs work done by the compiler andadds additional pipeline stages. Although effective techniques,such as prefetching and threading, have been proposedto deal with anticipable, long-latency misses, theshorter, more diffuse stalls due to difficult-to-anticipate,first- or second-level misses are less easily hidden on in-orderarchitectures. This paper addresses this problemby proposing a microarchitectural technique, referred toas two-pass pipelining, wherein the program executes ontwo in-order back-end pipelines coupled by a queue. The"advance" pipeline executes instructions greedily, withoutstalling on unanticipated latency dependences (executingindependent instructions while otherwise blocking instructionsare deferred). The "backup" pipeline allows concurrentresolution of instructions that were deferred in theother pipeline, resulting in the absorption of shorter missesand the overlap of longer ones. This paper argues that thisdesign is both achievable and a good use of transistor resourcesand shows results indicating that it can deliver significantspeedups for in-order processor designs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
David I. August , Daniel A. Connors , Scott A. Mahlke , John W. Sias , Kevin M. Crozier , Ben-Chung Cheng , Patrick R. Eaton , Qudus B. Olaniran , Wen-mei W. Hwu, Integrated predicated and speculative execution in the IMPACT EPIC architecture, Proceedings of the 25th annual international symposium on Computer architecture, p.227-237, June 27-July 02, 1998, Barcelona, Spain
|
| |
2
|
|
 |
3
|
Scott A. Mahlke , William Y. Chen , Roger A. Bringmann , Richard E. Hank , Wen-Mei W. Hwu , B. Ramakrishna Rau , Michael S. Schlansker, Sentinel scheduling: a model for compiler-controlled speculative execution, ACM Transactions on Computer Systems (TOCS), v.11 n.4, p.376-408, Nov. 1993
[doi> 10.1145/161541.159765]
|
| |
4
|
[4] Intel Corporation, Intel Itanium 2 Processor Reference Manual for Software Development and Optimization, Apr. 2003.
|
 |
5
|
|
| |
6
|
|
 |
7
|
David M. Gallagher , William Y. Chen , Scott A. Mahlke , John C. Gyllenhaal , Wen-mei W. Hwu, Dynamic memory disambiguation using the memory conflict buffer, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.183-193, October 05-07, 1994, San Jose, California, United States
|
 |
8
|
Rumi Zahir , Jonathan Ross , Dale Morris , Drew Hess, OS and compiler considerations in the design of the IA-64 architecture, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.212-221, November 2000, Cambridge, Massachusetts, United States
|
| |
9
|
|
 |
10
|
|
| |
11
|
[11] E. S. Fetzer, M. Gibson, A. Klein, N. Calick, C. Zhu, E. Busta, and B. Mohammad, "A fully bypassed six-issue integer datapath and register file on the itanium-2 microprocessor," IEEE Journal of Solid-State Circuits, vol. 37, Nov. 2002.
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
16
|
J. R. Goodman , Jian-tu Hsieh , Koujuch Liou , Andrew R. Pleszkun , P. B. Schechter , Honesty C. Young, PIPE: a VLSI decoupled architecture, Proceedings of the 12th annual international symposium on Computer architecture, p.20-27, June 17-19, 1985, Boston, Massachusetts, United States
|
 |
17
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
18
|
|
 |
19
|
|
CITED BY 11
|
|
|
|
|
Shailender Chaudhry , Robert Cypher , Magnus Ekman , Martin Karlsson , Anders Landin , Sherman Yip , Håkan Zeffer , Marc Tremblay, Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
|
|
|
Guilherme Ottoni , Ram Rangan , Adam Stoler , David I. August, Automatic Thread Extraction with Decoupled Software Pipelining, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.105-118, November 12-16, 2005, Barcelona, Spain
|
|
|
John W. Sias , Sain-zee Ueng , Geoff A. Kent , Ian M. Steiner , Erik M. Nystrom , Wen-mei W. Hwu, Field-testing IMPACT EPIC research results in Itanium 2, ACM SIGARCH Computer Architecture News, v.32 n.2, p.26, March 2004
|
|
|
|
|
|
|
|
|
Ronald D. Barnes , John W. Sias , Erik M. Nystrom , Sanjay J. Patel , Jose (Nacho) Navarro , Wen-mei W. Hwu, Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining, IEEE Transactions on Computers, v.55 n.1, p.18-33, January 2006
|
|
|
|
|
|
|
|
|
|
|