|
ABSTRACT
Process variation affects processor pipelines by making some stages slower and others faster, therefore exacerbating pipeline unbalance. This reduces the frequency attainable by the pipeline. To improve performance, this paper proposes ReCycle, an architectural framework that comprehensively applies cycle time stealing to the pipeline - transferring the time slack of the faster stages to the slow ones by skewing clock arrival times to latching elements after fabrication. As a result, the pipeline can be clocked with a period equal to the average stage delay rather than the longest one. In addition, ReCycle's frequency gains are enhanced with Donor stages, which are empty stages added to "donate" slack to the slow stages. Finally, ReCycle can also convert slack into power reductions. For a 17FO4 pipeline, ReCycle increases the frequency by 12% and the application performance by 9% on average. Combining ReCycle and donor stages delivers improvements of 36% in frequency and 15% in performance onaverage, completely reclaiming the performance losses due to variation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
Kerry Bernstein , Keith M. Carrig , Christopher M. Durham , Patrick R. Hansen , David Hogenmiller , Edward J. Nowak , Norman J. Rohrer, High speed CMOS design styles, Kluwer Academic Publishers, Norwell, MA, 1998
|
| |
5
|
|
| |
6
|
K. Bowman, S. Duvall, and J. Meindl. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2):183--190, 2002.
|
 |
7
|
|
 |
8
|
A. Chakraborty , K. Duraisami , A. Sathanur , P. Sithambaram , L. Benini , A. Macii , E. Macii , M. Poncino, Dynamic thermal clock skew compensation using tunable delay buffers, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
[doi> 10.1145/1165573.1165612]
|
| |
9
|
|
 |
10
|
|
| |
11
|
L. Cotten. Maximum rate pipelined systems. In AFIPS Spring Joint Computing Conference, 1969.
|
| |
12
|
N. Cressie. Statistics for Spatial Data. John Wiley & Sons, 1993.
|
| |
13
|
A. DeHon, T. Knight, Jr., and T. Simon. Automatic impedance control. In ISSCC Digest of Technical Papers, February 1993.
|
 |
14
|
Utpal Desai , Simon Tam , Robert Kim , Ji Zhang , Stefan Rusu, Itanium processor clock design, Proceedings of the 2000 international symposium on Physical design, p.94-98, May 2000, San Diego, California, United States
[doi> 10.1145/332357.332380]
|
 |
15
|
|
| |
16
|
|
| |
17
|
Dan Ernst , Nam Sung Kim , Shidhartha Das , Sanjay Pant , Rajeev Rao , Toan Pham , Conrad Ziesler , David Blaauw , Todd Austin , Krisztian Flautner , Trevor Mudge, Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.7, December 03-05, 2003
|
| |
18
|
|
| |
19
|
Paul Friedberg , Yu Cao , Jason Cain , Ruth Wang , Jan Rabaey , Costas Spanos, Modeling Within-Die Spatial Correlation Effects for Process-Design Co-Optimization, Proceedings of the 6th International Symposium on Quality of Electronic Design, p.516-521, March 21-23, 2005
[doi> 10.1109/ISQED.2005.82]
|
| |
20
|
P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon. High-performance microprocessor design. IEEE J. Solid-State Circuits, 33(5):676--686, May 1998.
|
 |
21
|
|
| |
22
|
R. Ho, K. Mai, and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4), April 2001.
|
 |
23
|
M. S. Hrishikesh , Doug Burger , Norman P. Jouppi , Stephen W. Keckler , Keith I. Farkas , Premkishore Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
| |
24
|
E. Humenay, D. Tarjan, and K. Skadron. Impact of parameter variations on multicore chips. In Workshop on Architectural Support for Gigascale Integration (ASGI), June 2006.
|
| |
25
|
International Technology Roadmap for Semiconductors (2005 Edition).
|
 |
26
|
|
| |
27
|
T. Karnik, S. Borkar, and V. De. Probabilistic and variation-tolerant design: Key to continued moore's law. In TAU Workshop, 2004.
|
| |
28
|
|
 |
29
|
Jinson Koppanalil , Prakash Ramrakhyani , Sameer Desai , Anu Vaidyanathan , Eric Rotenberg, A case for dynamic pipeline scaling, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08-11, 2002, Grenoble, France
[doi> 10.1145/581630.581632]
|
 |
30
|
Seokwoo Lee , Shidhartha Das , Toan Pham , Todd Austin , David Blaauw , Trevor Mudge, Reducing pipeline energy demands with local DVS and dynamic retiming, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
[doi> 10.1145/1013235.1013313]
|
| |
31
|
|
| |
32
|
B. Nikolic, L. Chang, and T.-J. King. Performance of deeply-scaled, power-constrained circuits. In International Conference on Solid State Devices and Materials, pages 154--155, September 2003.
|
| |
33
|
|
| |
34
|
|
| |
35
|
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, 2005.
|
| |
36
|
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC Simulator, January 2005. http://sesc.sourceforge.net.
|
| |
37
|
P. Ribeiro Jr. and P. Diggle. geoR: a package for geostatistical analysis. R-NEWS, 1(2):14--18, June 2001.
|
| |
38
|
T. Sakurai and R. Newton. Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas. IEEE JSSC, 25(2):584--594, 1990.
|
| |
39
|
T. Shanley. The Unabridged Pentium-4. Addison-Wesley, July 2004.
|
 |
40
|
|
| |
41
|
M. Shoji. Elimination of process-dependent clock skew in CMOS VLSI. In Journal of Solid State Circuits, pages 875--880, 1986.
|
 |
42
|
|
| |
43
|
A. Srivastava, D. Sylvester, and D. Blaauw. Statistical Analysis and Optimization for VLSI: Timing and Power. Springer, 2005.
|
| |
44
|
D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0. Technical Report 2006/86, HP Laboratories, June 2006.
|
| |
45
|
J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of dieto-die and within-die parameter variations on microprocessor frequency and leakage. Journal of Solid-State Circuits, 37(11):1396--1402, 2002.
|
| |
46
|
X. Vera, O. Ünsal, and A. González. X-pipe: An adaptive resilient microarchitecture for parameter variations. In Workshop on Architectural Support for Gigascale Integration, June 2006.
|
 |
47
|
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
Bonesi Stefano , Davide Bertozzi , Luca Benini , Enrico Macii, Process variation tolerant pipeline design through a placement-aware multiple voltage island design style, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abhishek Das , Berkin Ozisikyilmaz , Serkan Ozdemir , Gokhan Memik , Joseph Zambreno , Alok Choudhary, Evaluating the effects of cache redundancy on profit, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.388-398, November 08-12, 2008
|
|