|
ABSTRACT
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0.8µm, 0.35µm, and 0.18µm. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future.A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster --- consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Pritpal S. Ahuja , Douglas W. Clark , Anne Rogers, The performance impact of incomplete bypassing in processor pipelines, Proceedings of the 28th annual international symposium on Microarchitecture, p.36-45, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
2
|
C. Asato, R. Montoye, J. Gmuender, E. W. Simmons, A. Ike, and J. Zasio. A 14-port 3.Sns ll6-word 64b Read-Renaming Register F'de. In 1995 IEEE International Sold-State Circuits Conference Digest of Technical Papers, pages 104-105, February 1995.
|
| |
3
|
Mark T. Bohr. Interconnect Sealing - The Real Limiter to High Performance ULSI. in 1995 International Electron Devices Meeting Technical Digest, pages 241-244, 1995.
|
| |
4
|
Doug Burger, Todd M. Austin, and Steve Bennett. Evaluating Future Microprocessors: The Simplesealar Tool Set. Technical Report CS- TR-96-1308 (Available from http'.//www.cs.wisc.edtt/trs.html), University of Wisconsin-Madison, July 1996.
|
 |
5
|
|
| |
6
|
|
| |
7
|
Linley Gwennap. Speed Kills? Not for RISe Processors, Micropro. cessor Report, 7(3):3, March 1993.
|
| |
8
|
Linley Gwennap. HAL Reveals Multichip SPARC Processor, Micro. processor Report, 9(3), March 1995.
|
| |
9
|
Linley Gwermap. Intel's P6 Uses Deeoupled Supersealar Design, Microprocessor Report, 9(2), February 1995.
|
| |
10
|
Jim Keller. The 21264: A Supersealar Alpha Processor with Out-of- Order Execution, October 1996. 9th Annual Microprocessor Forum, San Jose, California.
|
| |
11
|
Gregory A. Kemp and Manoj Franklin, PEWs: A Decentralized Dynamic Scheduler for ILP Processing. In Proceedings of the lnterna. tional Conference on Parallel Processing, volume I, pages 239-246, 1996.
|
| |
12
|
Ashok Kumar. The HP-PA8000 RISC CPU: A High Performance Outof-Order Processor. In Proceedings of the Hot Chips VIII, pages 9-20, August 1996.
|
| |
13
|
Scott MeFarling. Combining Branch Predictors. DEC WRL Technical Note TN-36, DEC Western Research Laboratory, 1993.
|
| |
14
|
Meta-Software inc. HSpice User's Manual, June 1987,
|
| |
15
|
Subbarao Palaeharla, Norman P. Jouppl, and James E. Smith, Quantifying the Complexity of Supersealar Processors. Technical Report CS- TR-96-1328 (Available from http'J/www.es.wise.edu/trs,html), University of Wisconsin-Madison, November 1996.
|
| |
16
|
|
| |
17
|
N. Vasseghi et al. 200 MHz Supersealar RISC Processor Circuit Design Issues. in 1996 IEEE International Sold-State Circuits Conference Digest of Technical Papers, pages 356--357, February 1995,
|
| |
18
|
Tomohisa Wada, Suresh Rajan, and Stevea A, Przybylski, An Analytical Access Tune Model for On-Chip Cache Memories. IEEE Journal of Solid.State Circuits, 27(8):1147-1156, August 1992,
|
| |
19
|
|
| |
20
|
Nell C. Wilhelm. Why Wire Delays Will No Longer Scale for VLSI Chips. Technical Report SMI.,I TR-95-44, Sun Mierosystems Laboratories, August 1995.
|
| |
21
|
Steven J. E. Wilton and Norman P. Jouppi. An Enhanced Access and Cycle Time Model for On-Chip Caches. Technical Report 93/5, DEC Western Research Laboratory, July 1994.
|
| |
22
|
|
CITED BY 211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chong-Liang Ooi , Seon Wook Kim , Il Park , Rudolf Eigenmann , Babak Falsafi , T. N. Vijaykumar, Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor, Proceedings of the 15th international conference on Supercomputing, p.368-380, June 2001, Sorrento, Italy
|
|
|
|
|
|
|
|
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rajagopalan Desikan , Doug Burger , Stephen W. Keckler , Llorenc Cruz , Fernando Latorre , Antonio González , Mateo Valero, Errata on "Measuring Experimental Error in Microprocessor Simulation", ACM SIGARCH Computer Architecture News, v.30 n.1, March 2002
|
|
|
|
|
|
|
|
|
Alper Buyuktosunoglu , David Albonesi , Stanley Schuster , David Brooks , Pradip Bose , Peter Cook, A circuit level implementation of an adaptive issue queue for power-aware microprocessors, Proceedings of the 11th Great Lakes symposium on VLSI, p.73-78, March 2001, West Lafayette, Indiana, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amirali Baniasadi , Andreas Moshovos, Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.337-347, December 2000, Monterey, California, United States
|
|
|
|
|
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Francisca Quintana , Jesus Corbal , Roger Espasa , Mateo Valero, Adding a vector unit to a superscalar processor, Proceedings of the 13th international conference on Supercomputing, p.1-10, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
Lieven Eeckhout , Tom Vander Aa , Bart Goeman , Hans Vandierendonck , Rudy Lauwereins , Koen De Bosschere, Application domains for fixed-length block structured architectures, Australian Computer Science Communications, v.23 n.4, p.35-44, January 2001
|
|
|
Iván Martel , Daniel Ortega , Eduard Ayguadé , Mateo Valero, Increasing effective IPC by exploiting distant parallelism, Proceedings of the 13th international conference on Supercomputing, p.348-355, June 20-25, 1999, Rhodes, Greece
|
|
|
Masahiro Goshima , Kengo Nishino , Toshiaki Kitamura , Yasuhiko Nakashima , Shinji Tomita , Shin-ichiro Mori, A high-speed dynamic instruction scheduling scheme for superscalar processors, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm , Dean M. Tullsen, Simultaneous Multithreading: A Platform for Next-Generation Processors, IEEE Micro, v.17 n.5, p.12-19, September 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alper Buyuktosunoglu , David H. Albonesi , Pradip Bose , Peter W. Cook , Stanley E. Schuster, Tradeoffs in power-efficient issue queue design, Proceedings of the 2002 international symposium on Low power electronics and design, August 12-14, 2002, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Teresa Monreal , Antonio González , Mateo Valero , José González , Victor Viñals, Delaying physical register allocation through virtual-physical registers, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.186-192, November 16-18, 1999, Haifa, Israel
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Venkata Syam P. Rapaka , Emil Talpes , Diana Marculescu, Mixed-clock issue queue design for energy aware, high-performance cores, Proceedings of the 2004 conference on Asia South Pacific design automation: electronic design and solution fair, p.380-383, January 27-30, 2004, Yokohama, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ashok Jagannathan , Hannah Honghua Yang , Kris Konigsfeld , Dan Milliron , Mosur Mohan , Michail Romesis , Glenn Reinman , Jason Cong, Microarchitecture evaluation with floorplanning and interconnect pipelining, Proceedings of the 2005 conference on Asia South Pacific design automation, January 18-21, 2005, Shanghai, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jose Renau , Karin Strauss , Luis Ceze , Wei Liu , Smruti Sarangi , James Tuck , Josep Torrellas, Thread-Level Speculation on a CMP can be energy efficient, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Joseph J. Sharkey , Dmitry V. Ponomarev , Kanad Ghose , Oguz Ergin, Instruction packing: reducing power and delay of the dynamic scheduling logic, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
|
|
|
Jason Cong , Ashok Jagannathan , Glenn Reinman , Yuval Tamir, Understanding the energy efficiency of SMT and CMP with multiclustering, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
Murali Jayapala , Francisco Barat , Tom Vander Aa , Francky Catthoor , Henk Corporaal , Geert Deconinck, Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors, IEEE Transactions on Computers, v.54 n.6, p.672-683, June 2005
|
|
|
|
|
|
Jason Cong , Ashok Jagannathan , Yuchun Ma , Glenn Reinman , Jie Wei , Yan Zhang, An automated design flow for 3D microarchitecture evaluation, Proceedings of the 2006 conference on Asia South Pacific design automation, January 24-27, 2006, Yokohama, Japan
|
|
|
|
|
|
R. González , A. Cristal , M. Pericas , M. Valero , A. Veidenbaum, An asymmetric clustered processor based on value content, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
Viji Srinivasan , David Brooks , Michael Gschwind , Pradip Bose , Victor Zyuban , Philip N. Strenski , Philip G. Emma, Optimizing pipelines for power and performance, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
Julia Chen , Philo Juang , Kevin Ko , Gilberto Contreras , David Penry , Ram Rangan , Adam Stoler , Li-Shiuan Peh , Margaret Martonosi, Hardware-modulated parallelism in chip multiprocessors, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ramadass Nagarajan , Sundeep K. Kushwaha , Doug Burger , Kathryn S. McKinley , Calvin Lin , Stephen W. Keckler, Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.74-84, September 29-October 03, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kyle Rupnow , Arun Rodrigues , Keith Underwood , Katherine Compton, Scientific applications vs. SPEC-FP: a comparison of program behavior, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
Hans Vandierendonck , Philippe Manet , Thibault Delavallee , Igor Loiselle , Jean-Didier Legat, By-passing the out-of-order execution pipeline to increase energy-efficiency, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
|
|
|
Shailender Chaudhry , Robert Cypher , Magnus Ekman , Martin Karlsson , Anders Landin , Sherman Yip , Håkan Zeffer , Marc Tremblay, Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
Jung Ho Ahn , Mattan Erez , William J. Dally, Tradeoff between data-, instruction-, and thread-level parallelism in stream processors, Proceedings of the 21st annual international conference on Supercomputing, June 17-21, 2007, Seattle, Washington
|
|
|
|
|
|
Francisco J. Mesa-Martínez , Michael C. Huang , Jose Renau, SEED: scalable, efficient enforcement of dependences, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fernando Latorre , Grigorios Magklis , José González , Pedro Chaparro , Antonio González, Building a large instruction window through ROB compression, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.41-48, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
Frederico Pratas , Georgi Gaydadjiev , Mladen Berekovic , Leonel Sousa , Stefanos Kaxiras, Low power microarchitecture with instruction reuse, Proceedings of the 2008 conference on Computing frontiers, May 05-07, 2008, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vincent W. Freeh , David K. Lowenthal , Feng Pan , Nandini Kappiah , Rob Springer , Barry L. Rountree , Mark E. Femal, Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications, IEEE Transactions on Parallel and Distributed Systems, v.18 n.6, p.835-848, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lan Dong , Xiufeng Sui, A multithreading embedded architecture, Proceedings of the 7th conference on Data networks, communications, computers, p.152-154, November 07-09, 2008, Bucharest, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|