ACM Home Page
Please provide us with feedback. Feedback
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
Full text PdfPdf (1.03 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 29th annual international symposium on Computer architecture table of contents
Anchorage, Alaska
SESSION: Session 1: Processor pipelines table of contents
Pages: 14 - 24  
Year of Publication: 2002
ISBN ~ ISSN:1063-6897 , 0-7695-1605-X
Also published in ...
Authors
M. S. Hrishikesh  The University of Texas, Austin
Doug Burger  The University of Texas, Austin
Norman P. Jouppi  Compaq Computer Corporation
Stephen W. Keckler  The University of Texas, Austin
Keith I. Farkas  Compaq Computer Corporation
Premkishore Shivakumar  The University of Texas, Austin
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE TCCA : IEEE Computer Society Technical Committee on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 69,   Citation Count: 67
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Microprocessor clock frequency has improved by nearly 40% annually over the past decade. This improvement has been provided, in equal measure, by smaller technologies and deeper pipelines. From our study of the SPEC 2000 benchmarks, we find that for a high-performance architecture implemented in 100nm technology, the optimal clock period is approximately 8 fan-out-of-four (FO4) inverter delays for integer benchmarks, comprised of 6 FO4 of useful work and an overhead of about 2 FO4. The optimal clock period for floating-point benchmarks is 6 FO4. We find these optimal points to be insensitive to latch and clock skew overheads. Our study indicates that further pipelining can at best improve performance of integer programs by a factor of 2 over current designs. At these high clock frequencies it will be difficult to design the instruction issue window to operate in a single cycle. Consequently, we propose and evaluate a high-frequency design called a segmented instruction window.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
Glenn Hinton, Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, and Patrice Roussel. The microarchitecture of the pentium 4 processor, Intel Technology Journal, 1, February 2001.
 
6
Ron Ho, Kenneth W. Mai, and Mark A. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):490-504, April 2001.
 
7
Norman P. Jouppi and Steven J. E. Wilton. An enhanced access and cycle time model for on-chip caches. Technical Report 93.5, Compaq Computer Corporation, July 1994.
 
8
James S. Kolodzey. Cray-1 computer technology. IEEE Transactions on Components. Hybrids, and Manufacturing Technology CHMT-4(2), 4(2):181-187, March 1981.
9
 
10
Nasser A. Kurd, Javed S. Barkatullah, Rommel O. Dizon, Thomas D. Fletcher, and Paul D. Madland. Multi-GHz clocking scheme for Intel Pentium 4 microprocessor. In Proceedings of the International Solid-state Circuits Conference, pages 404-405, February 2001.
11
 
12
Premkishore Shivakumar and Norman P. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical Report 2001/2, Compaq Computer Corporation, August 2001.
13
 
14
Vladimir Stojanović and Vojin G. Oklobdžija. Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems. IEEE Journal of Solid-state Circuits, 34(4):536-548, April 1999.
 
15
 
16
S. Tyagi, M. Alavi, R. Bigwood, T. Bramblett, J. Brandenburg, W. Chen, B. Crew, M. Hussein, P. Jacob, C. Kenyon, C. Lo, B. Mcintyre, Z. Ma, P. Moon, P. Nguyen, L. Rumaner, R. Schweinfurth, S. Sivakumar, M. Stettler, S. Thompson, B. Tufts, J. Xu, S. Yang, and M. Bohr. A 130nm generation logic technology featuring 70nm transistors, dual vt transistors and 6 layers of cu interconnects. In Proceedings of International Electronic Devices Meeting, December 2000.

CITED BY  67

Collaborative Colleagues:
M. S. Hrishikesh: colleagues
Doug Burger: colleagues
Norman P. Jouppi: colleagues
Stephen W. Keckler: colleagues
Keith I. Farkas: colleagues
Premkishore Shivakumar: colleagues