|
ABSTRACT
Microprocessor clock frequency has improved by nearly 40% annually over the past decade. This improvement has been provided, in equal measure, by smaller technologies and deeper pipelines. From our study of the SPEC 2000 benchmarks, we find that for a high-performance architecture implemented in 100nm technology, the optimal clock period is approximately 8 fan-out-of-four (FO4) inverter delays for integer benchmarks, comprised of 6 FO4 of useful work and an overhead of about 2 FO4. The optimal clock period for floating-point benchmarks is 6 FO4. We find these optimal points to be insensitive to latch and clock skew overheads. Our study indicates that further pipelining can at best improve performance of integer programs by a factor of 2 over current designs. At these high clock frequencies it will be difficult to design the instruction issue window to operate in a single cycle. Consequently, we propose and evaluate a high-frequency design called a segmented instruction window.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
Glenn Hinton, Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, and Patrice Roussel. The microarchitecture of the pentium 4 processor, Intel Technology Journal, 1, February 2001.
|
| |
6
|
Ron Ho, Kenneth W. Mai, and Mark A. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):490-504, April 2001.
|
| |
7
|
Norman P. Jouppi and Steven J. E. Wilton. An enhanced access and cycle time model for on-chip caches. Technical Report 93.5, Compaq Computer Corporation, July 1994.
|
| |
8
|
James S. Kolodzey. Cray-1 computer technology. IEEE Transactions on Components. Hybrids, and Manufacturing Technology CHMT-4(2), 4(2):181-187, March 1981.
|
 |
9
|
|
| |
10
|
Nasser A. Kurd, Javed S. Barkatullah, Rommel O. Dizon, Thomas D. Fletcher, and Paul D. Madland. Multi-GHz clocking scheme for Intel Pentium 4 microprocessor. In Proceedings of the International Solid-state Circuits Conference, pages 404-405, February 2001.
|
 |
11
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
12
|
Premkishore Shivakumar and Norman P. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical Report 2001/2, Compaq Computer Corporation, August 2001.
|
 |
13
|
|
| |
14
|
Vladimir Stojanović and Vojin G. Oklobdžija. Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems. IEEE Journal of Solid-state Circuits, 34(4):536-548, April 1999.
|
| |
15
|
|
| |
16
|
S. Tyagi, M. Alavi, R. Bigwood, T. Bramblett, J. Brandenburg, W. Chen, B. Crew, M. Hussein, P. Jacob, C. Kenyon, C. Lo, B. Mcintyre, Z. Ma, P. Moon, P. Nguyen, L. Rumaner, R. Schweinfurth, S. Sivakumar, M. Stettler, S. Thompson, B. Tufts, J. Xu, S. Yang, and M. Bohr. A 130nm generation logic technology featuring 70nm transistors, dual vt transistors and 6 layers of cu interconnects. In Proceedings of International Electronic Devices Meeting, December 2000.
|
CITED BY 67
|
|
|
|
|
|
|
|
|
|
|
Doug Burger , Stephen W. Keckler , Kathryn S. McKinley , Mike Dahlin , Lizy K. John , Calvin Lin , Charles R. Moore , James Burrill , Robert G. McDonald , William Yoder , the TRIPS Team, Scaling to the End of Silicon with EDGE Architectures, Computer, v.37 n.7, p.44-55, July 2004
|
|
|
Steven Hsu , Shih-Lien Lu , Shih-Chang Lai , Ram Krishnamurthy , Konrad Lai, Dynamic addressing memory arrays with physical locality, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Brooks , P. Bose , V. Srinivasan , M. K. Gschwind , P. G. Emma , M. G. Rosenfield, New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors, IBM Journal of Research and Development, v.47 n.5-6, p.653-670, September 2003
|
|
|
Richard B. Kujoth , Chi-Wei Wang , Derek B. Gottlieb , Jeffrey J. Cook , Nicholas P. Carter, A reconfigurable unit for a clustered programmable-reconfigurable processor, Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays, February 22-24, 2004, Monterey, California, USA
|
|
|
|
|
|
Nam Sung Kim , Taeho Kgil , Valeria Bertacco , Todd Austin , Trevor Mudge, Microarchitectural power modeling techniques for deep sub-micron microprocessors, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
Ashok Jagannathan , Hannah Honghua Yang , Kris Konigsfeld , Dan Milliron , Mosur Mohan , Michail Romesis , Glenn Reinman , Jason Cong, Microarchitecture evaluation with floorplanning and interconnect pipelining, Proceedings of the 2005 conference on Asia South Pacific design automation, January 18-21, 2005, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
Jason Cong , Ashok Jagannathan , Yuchun Ma , Glenn Reinman , Jie Wei , Yan Zhang, An automated design flow for 3D microarchitecture evaluation, Proceedings of the 2006 conference on Asia South Pacific design automation, January 24-27, 2006, Yokohama, Japan
|
|
|
Viji Srinivasan , David Brooks , Michael Gschwind , Pradip Bose , Victor Zyuban , Philip N. Strenski , Philip G. Emma, Optimizing pipelines for power and performance, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jinson Koppanalil , Prakash Ramrakhyani , Sameer Desai , Anu Vaidyanathan , Eric Rotenberg, A case for dynamic pipeline scaling, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08-11, 2002, Grenoble, France
|
|
|
|
|
|
Ramadass Nagarajan , Sundeep K. Kushwaha , Doug Burger , Kathryn S. McKinley , Calvin Lin , Stephen W. Keckler, Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.74-84, September 29-October 03, 2004
|
|
|
|
|
|
Richard B. Kujoth , Chi-Wei Wang , Jeffrey J. Cook , Derek B. Gottlieb , Nicholas P. Carter, A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor, Microprocessors & Microsystems, v.31 n.2, p.146-159, March, 2007
|
|
|
Jung Ho Ahn , Mattan Erez , William J. Dally, Tradeoff between data-, instruction-, and thread-level parallelism in stream processors, Proceedings of the 21st annual international conference on Supercomputing, June 17-21, 2007, Seattle, Washington
|
|
|
Steven Swanson , Andrew Schwerin , Martha Mercaldi , Andrew Petersen , Andrew Putnam , Ken Michelson , Mark Oskin , Susan J. Eggers, The WaveScalar architecture, ACM Transactions on Computer Systems (TOCS), v.25 n.2, p.4-es, May 2007
|
|
|
|
|
|
Alessandro Bardine , Pierfrancesco Foglia , Giacomo Gabrielli , Cosimo Antonio Prete, Analysis of static and dynamic energy consumption in NUCA caches: initial results, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.105-112, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nam Sung Kim , Taeho Kgil , K. Bowman , V. De , T. Mudge, Total power-optimal pipelining and parallel processing under process variations in nanometer technology, Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design, p.535-540, November 06-10, 2005, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|