|
ABSTRACT
The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scaling as semiconductor devices shrink, the achievable performance growth of conventional microarchitectures will slow substantially. In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay. Using the results of these models, we measure the simulated performance—estimating both clock rate and IPC —of an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm technology. We perform this analysis for three clock scaling targets and two microarchitecture scaling strategies: pipeline scaling and capacity scaling. We find that no scaling strategy permits annual performance improvements of better than 12.5%, which is far worse than the annual 50-60% to which we have grown accustomed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Vikas Agarwal, Stephen W. Keckler, and Doug Burger. Scaling of microarchitectural structures in future process technologies. Technical Report TR2000-02, Department of Computer Sciences, The University of Texas at Austin, April 2000.
|
 |
2
|
|
| |
3
|
B.S. Amrutur and M.A. Horowitz. Speed and power scaling of SRAMs. IEEE Journal of Solid State Circuits, 35(2): 175-185, February 2000.
|
| |
4
|
Geordie Braceras, Alan Roberts, John Connor, Reid Wistort, Terry Frederick, Marcel Robillard, Stu Hall, Steve Burns, and Matt Graf. A 940MHz data rate 8Mb CMOS SRAM. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 198-199, February 1999.
|
| |
5
|
|
| |
6
|
Doug Burger and Todd M. Austin. The simplescalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.
|
| |
7
|
Doug Burger, Alain Kfigi, and M.S. Hrishikesh. Memory hierarchy extensions to simplescalar 3.0. Technical Report TR99-25, Department of Computer Sciences, The University of Texas at Austin, April 2000.
|
| |
8
|
Keith Diefendorff. Power4 focuses on memory bandwidth. Microprocessor Report, 13(13), October 1999.
|
| |
9
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, Proceedings of the 28th annual international symposium on Microarchitecture, p.146-156, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
10
|
|
| |
11
|
Mark Horowitz, Ron Ho, and Ken Mai. The future of wires. In Seminconductor Research Corporation Workshop on Interconnects for Systems on a Chip, May 1999.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
Ken Mai , Tim Paaske , Nuwan Jayasena , Ron Ho , William J. Dally , Mark Horowitz, Smart Memories: a modular reconfigurable architecture, Proceedings of the 27th annual international symposium on Computer architecture, p.161-171, June 2000, Vancouver, British Columbia, Canada
|
| |
16
|
|
| |
17
|
S. Naffziger. A subnanosecond 0.5#m 64b adder design. In Digest of Technical Papers, International Solid-State Circuits Conference, pages 362-363, February 1996.
|
 |
18
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
19
|
Glenn Reinman and Norm Jouppi. Extensions to cacti, 1999. Unpublished document.
|
| |
20
|
Scott Rixner, William J. Dally, Brucek Khailany, Peter Mattson, Ujval J. Kapasi, and John D. Owens. Register organization for media processing. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, January 2000.
|
| |
21
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
22
|
The national technology roadmap for semiconductors. Semiconductor Industry Association, 1999.
|
| |
23
|
Hiroshi Shimizu, Kenji Ijitsu, Hideo Akiyoshi, Keizo Aoyama, Hirotaka Takatsuka, Kou Watanabe, Ryota Nanjo, and Yoshihiro Takao. A 1.4ns access 700MHz 288Kb SRAM macro with expandable architecture. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 190-191,459, February 1999.
|
| |
24
|
|
 |
25
|
|
| |
26
|
Standard Performance Evaluation Corporation. SPEC Newsletter, September 1995.
|
| |
27
|
|
| |
28
|
A. J. van Genderen and N. P. van der Meijs. Xspace user's manual. Technical Report ET-CAS 96-02, Delft University of Technology, Department of Electrical Engineering, August 1996.
|
| |
29
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
[doi> 10.1109/2.612254]
|
| |
30
|
Steven J.E. Wilton and Norman P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 95/3, Digital Equipment Corporation, Western Research Laboratory, 1995.
|
| |
31
|
Cangsang Zhao, Uddalak Bhattacharya, Martin Denham, Jim Kolousek, Yi Lu, Yong-Gee Ng, Novat Nintunze, Kamal Sarkez, and Hemmige Varadarajan. An 18Mb, 12.3GB/s cmos pipeline-burst cache SRAM with 1.54Gb/s/pin. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 200-201,461, February 1999.
|
CITED BY 126
|
|
|
|
|
Chong-Liang Ooi , Seon Wook Kim , Il Park , Rudolf Eigenmann , Babak Falsafi , T. N. Vijaykumar, Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor, Proceedings of the 15th international conference on Supercomputing, p.368-380, June 2001, Sorrento, Italy
|
|
|
|
|
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Chris Weaver , Rajeev Krishna , Lisa Wu , Todd Austin, Application specific architectures: a recipe for fast, flexible and power efficient designs, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Steven Hsu , Shih-Lien Lu , Shih-Chang Lai , Ram Krishnamurthy , Konrad Lai, Dynamic addressing memory arrays with physical locality, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
Randy Huang , John Wawrzynek , André DeHon, Stochastic, spatial routing for hypergraphs, trees, and meshes, Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays, February 23-25, 2003, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mongkol Ekpanyapong , Jacob R. Minz , Thaisiri Watewai , Hsien-Hsin S. Lee , Sung Kyu Lim, Profile-guided microarchitectural floorplanning for deep submicron processor design, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
|
|
|
|
|
|
Hassan Chafi , Chi Cao Minh , Austen McDonald , Brian D. Carlstrom , JaeWoong Chung , Lance Hammond , Christos Kozyrakis , Kunle Olukotun, TAPE: a transactional application profiling environment, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), ACM SIGOPS Operating Systems Review, v.38 n.5, December 2004
|
|
|
|
|
|
|
|
|
Seokwoo Lee , Shidhartha Das , Toan Pham , Todd Austin , David Blaauw , Trevor Mudge, Reducing pipeline energy demands with local DVS and dynamic retiming, Proceedings of the 2004 international symposium on Low power electronics and design, August 09-11, 2004, Newport Beach, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jason Cong , Ashok Jagannathan , Glenn Reinman , Michail Romesis, Microarchitecture evaluation with physical planning, Proceedings of the 40th conference on Design automation, June 02-06, 2003, Anaheim, CA, USA
|
|
|
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Modeling instruction placement on a spatial architecture, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ramadass Nagarajan , Sundeep K. Kushwaha , Doug Burger , Kathryn S. McKinley , Calvin Lin , Stephen W. Keckler, Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.74-84, September 29-October 03, 2004
|
|
|
|
|
|
A. W. Topol , D. C. La Tulipe, Jr. , L. Shi , D. J. Frank , K. Bernstein , S. E. Steen , A. Kumar , G. U. Singco , A. M. Young , K. W. Guarini , M. Ieong, Three-dimensional integrated circuits, IBM Journal of Research and Development, v.50 n.4/5, p.491-506, July 2006
|
|
|
|
|
|
Brian D. Carlstrom , Austen McDonald , Hassan Chafi , JaeWoong Chung , Chi Cao Minh , Christos Kozyrakis , Kunle Olukotun, The Atomos transactional programming language, ACM SIGPLAN Notices, v.41 n.6, June 2006
|
|
|
Michael Healy , Mario Vittes , Mongkol Ekpanyapong , Chinnakrishnan Ballapuram , Sung Kyu Lim , Hsien-Hsin S. Lee , Gabriel H. Loh, Microarchitectural floorplanning under performance and thermal tradeoff, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
|
|
|
Brian D. Carlstrom , JaeWoong Chung , Hassan Chafi , Austen McDonald , Chi Cao Minh , Lance Hammond , Christos Kozyrakis , Kunle Olukotun, Executing Java programs with transactional memory, Science of Computer Programming, v.63 n.2, p.111-129, 1 December 2006
|
|
|
|
|
|
Richard B. Kujoth , Chi-Wei Wang , Jeffrey J. Cook , Derek B. Gottlieb , Nicholas P. Carter, A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor, Microprocessors & Microsystems, v.31 n.2, p.146-159, March, 2007
|
|
|
Steven Swanson , Andrew Schwerin , Martha Mercaldi , Andrew Petersen , Andrew Putnam , Ken Michelson , Mark Oskin , Susan J. Eggers, The WaveScalar architecture, ACM Transactions on Computer Systems (TOCS), v.25 n.2, p.4-es, May 2007
|
|
|
|
|
|
|
|
|
Serkan Ozdemir , Arindam Mallik , Ja Chun Ku , Gokhan Memik , Yehea Ismail, Variable latency caches for nanoscale processor, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, ACM SIGARCH Computer Architecture News, v.32 n.2, p.2, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
Aaron Smith , Jon Gibson , Bertrand Maher , Nick Nethercote , Bill Yoder , Doug Burger , Kathryn S. McKinle , Jim Burrill, Compiling for EDGE Architectures, Proceedings of the International Symposium on Code Generation and Optimization, p.185-195, March 26-29, 2006
|
|
|
|
|
|
Feihui Li , Chrysostomos Nicopoulos , Thomas Richardson , Yuan Xie , Vijaykrishnan Narayanan , Mahmut Kandemir, Design and Management of 3D Chip Multiprocessors Using Network-in-Memory, ACM SIGARCH Computer Architecture News, v.34 n.2, p.130-141, May 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ronald D. Barnes , John W. Sias , Erik M. Nystrom , Sanjay J. Patel , Jose (Nacho) Navarro , Wen-mei W. Hwu, Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining, IEEE Transactions on Computers, v.55 n.1, p.18-33, January 2006
|
|
|
|
|
|
Haitham Akkary , Komal Jothi , Renjith Retnamma , Satyanarayana Nekkalapu , Doug Hall , Shahrokh Shahidzadeh, On the potential of latency tolerant execution in speculative multithreading, Proceedings of the 1st international forum on Next-generation multicore/manycore technologies, November 24-25, 2008, Cairo, Egypt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jacob Leverich , Hideho Arakida , Alex Solomatnikov , Amin Firoozshahian , Mark Horowitz , Christos Kozyrakis, Comparative evaluation of memory models for chip multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.5 n.3, p.1-30, November 2008
|
|
|
Daniel Frampton , Stephen M. Blackburn , Perry Cheng , Robin J. Garner , David Grove , J. Eliot B. Moss , Sergey I. Salishev, Demystifying magic: high-level low-level programming, Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, March 11-13, 2009, Washington, DC, USA
|
|
|
|
|
|
|
|
|
Srinath Sridharan , Michael DeBole , Guangyu Sun , Yuan Xie , Vijaykrishnan Narayanan, A criticality-driven microarchitectural three dimensional (3D) floorplanner, Proceedings of the 2009 Conference on Asia and South Pacific Design Automation, January 19-22, 2009, Yokohama, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Paul Gratz , Karthikeyan Sankaralingam , Heather Hanson , Premkishore Shivakumar , Robert McDonald , Stephen W. Keckler , Doug Burger, Implementation and Evaluation of a Dynamically Routed Processor Operand Network, Proceedings of the First International Symposium on Networks-on-Chip, p.7-17, May 07-09, 2007
|
|
|
|
|