ACM Home Page
Please provide us with feedback. Feedback
Complexity-effective superscalar processors
Full text PdfPdf (2.21 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 24th annual international symposium on Computer architecture table of contents
Denver, Colorado, United States
Pages: 206 - 218  
Year of Publication: 1997
ISBN:0-89791-901-7
Also published in ...
Authors
Subbarao Palacharla  Computer Sciences Department, University of Wisconsin-Madison, Madison, WI
Norman P. Jouppi  Western Research Laboratory, Digital Equipment Corporation, Palo Alto, CA
J. E. Smith  Dept. of Electrical and Computer Engg., University of Wisconsin-Madison, Madison, WI
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 194,   Citation Count: 211
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/264107.264201
What is a DOI?

ABSTRACT

The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0.8µm, 0.35µm, and 0.18µm. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future.A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster --- consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
C. Asato, R. Montoye, J. Gmuender, E. W. Simmons, A. Ike, and J. Zasio. A 14-port 3.Sns ll6-word 64b Read-Renaming Register F'de. In 1995 IEEE International Sold-State Circuits Conference Digest of Technical Papers, pages 104-105, February 1995.
 
3
Mark T. Bohr. Interconnect Sealing - The Real Limiter to High Performance ULSI. in 1995 International Electron Devices Meeting Technical Digest, pages 241-244, 1995.
 
4
Doug Burger, Todd M. Austin, and Steve Bennett. Evaluating Future Microprocessors: The Simplesealar Tool Set. Technical Report CS- TR-96-1308 (Available from http'.//www.cs.wisc.edtt/trs.html), University of Wisconsin-Madison, July 1996.
5
 
6
 
7
Linley Gwennap. Speed Kills? Not for RISe Processors, Micropro. cessor Report, 7(3):3, March 1993.
 
8
Linley Gwennap. HAL Reveals Multichip SPARC Processor, Micro. processor Report, 9(3), March 1995.
 
9
Linley Gwermap. Intel's P6 Uses Deeoupled Supersealar Design, Microprocessor Report, 9(2), February 1995.
 
10
Jim Keller. The 21264: A Supersealar Alpha Processor with Out-of- Order Execution, October 1996. 9th Annual Microprocessor Forum, San Jose, California.
 
11
Gregory A. Kemp and Manoj Franklin, PEWs: A Decentralized Dynamic Scheduler for ILP Processing. In Proceedings of the lnterna. tional Conference on Parallel Processing, volume I, pages 239-246, 1996.
 
12
Ashok Kumar. The HP-PA8000 RISC CPU: A High Performance Outof-Order Processor. In Proceedings of the Hot Chips VIII, pages 9-20, August 1996.
 
13
Scott MeFarling. Combining Branch Predictors. DEC WRL Technical Note TN-36, DEC Western Research Laboratory, 1993.
 
14
Meta-Software inc. HSpice User's Manual, June 1987,
 
15
Subbarao Palaeharla, Norman P. Jouppl, and James E. Smith, Quantifying the Complexity of Supersealar Processors. Technical Report CS- TR-96-1328 (Available from http'J/www.es.wise.edu/trs,html), University of Wisconsin-Madison, November 1996.
 
16
 
17
N. Vasseghi et al. 200 MHz Supersealar RISC Processor Circuit Design Issues. in 1996 IEEE International Sold-State Circuits Conference Digest of Technical Papers, pages 356--357, February 1995,
 
18
Tomohisa Wada, Suresh Rajan, and Stevea A, Przybylski, An Analytical Access Tune Model for On-Chip Cache Memories. IEEE Journal of Solid.State Circuits, 27(8):1147-1156, August 1992,
 
19
 
20
Nell C. Wilhelm. Why Wire Delays Will No Longer Scale for VLSI Chips. Technical Report SMI.,I TR-95-44, Sun Mierosystems Laboratories, August 1995.
 
21
Steven J. E. Wilton and Norman P. Jouppi. An Enhanced Access and Cycle Time Model for On-Chip Caches. Technical Report 93/5, DEC Western Research Laboratory, July 1994.
 
22

CITED BY  211

Collaborative Colleagues:
Subbarao Palacharla: colleagues
Norman P. Jouppi: colleagues
J. E. Smith: colleagues