|
ABSTRACT
Ensuring back-to-back execution of dependent instructionsin a conventional out-of-order processor requiresscheduling logic that wakes up and selects instructions atthe same rate as they are executed. To sustain high performance,integer ALU instructions typically have single-cyclelatency, consequently requiring scheduling logic withthe same single-cycle latency. Prior proposals have advocatedthe use of speculation in either the wakeup or selectphases to enable pipelining of scheduling logic to achievehigher clock frequency. In contrast, this paper proposesmacro-op scheduling, which systematically removesinstructions with single-cycle latency from the machine bycombining them into macro-ops, and performs nonspeculativepipelined scheduling of multi-cycle operations. Macro-opscheduling also increases the effective size of the schedulingwindow by enabling multiple instructions to occupy asingle issue queue entry. We demonstrate that pipelined 2-cyclemacro-op scheduling performs comparably or evenbetter than atomic scheduling or prior proposals for select-freescheduling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
[2] G. Hinton et al., The microarchitecture of the Pentium 4 processor, Intel Technology Journal Q1, 2001.
|
 |
3
|
M. S. Hrishikesh , Doug Burger , Norman P. Jouppi , Stephen W. Keckler , Keith I. Farkas , Premkishore Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
 |
4
|
|
 |
5
|
|
| |
6
|
[6] K. Diefendorff, K7 challenges Intel, Microprocessor Report, Vol. 12, No. 14, 1998.
|
 |
7
|
|
| |
8
|
|
 |
9
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
10
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
| |
11
|
|
| |
12
|
Masahiro Goshima , Kengo Nishino , Toshiaki Kitamura , Yasuhiko Nakashima , Shinji Tomita , Shin-ichiro Mori, A high-speed dynamic instruction scheduling scheme for superscalar processors, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
13
|
[13] D. C. Burger and T. M. Austin, The Simplescalar tool set, version 2.0, Technical Report CS-TR-97-1342, University of Wisconsin, Madison, 1997.
|
| |
14
|
[14] A. Kleinosowski, J. Flynn, N. Meares and D. J. Lilja, Adapting the SPEC2000 benchmarks suite for simulation-based computer architecture research, Workshop on Workload Characterization in International Conference on Computer Design, 2000.
|
| |
15
|
[15] Compaq Computer Corporation, Alpha 21264 microprocessor hardware reference manual, 1999.
|
 |
16
|
|
 |
17
|
Peter Y. T. Hsu , Joseph T. Rahmeh , Edward S. Davidson , Jacob A. Abraham, TIDBITS: speedup via time-delay bit-slicing in ALU design for VLSI technology, Proceedings of the 12th annual international symposium on Computer architecture, p.29-35, June 17-19, 1985, Boston, Massachusetts, United States
|
| |
18
|
[18] S. Gochman et al., The Intel Pentium M processor: Microarchitecture and performance, Intel Technology Journal vol. 7, issue 2, 2003.
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
Nadeem Malik , Richard J. Eickemeyer , Stamatis Vassiliadis, Interlock collapsing ALU for increased instruction-level parallelism, Proceedings of the 25th annual international symposium on Microarchitecture, p.149-157, December 01-04, 1992, Portland, Oregon, United States
|
| |
25
|
|
CITED BY 15
|
|
|
|
|
Joseph J. Sharkey , Dmitry V. Ponomarev , Kanad Ghose , Oguz Ergin, Instruction packing: reducing power and delay of the dynamic scheduling logic, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kyle Rupnow , Arun Rodrigues , Keith Underwood , Katherine Compton, Scientific applications vs. SPEC-FP: a comparison of program behavior, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
Hans Vandierendonck , Philippe Manet , Thibault Delavallee , Igor Loiselle , Jean-Didier Legat, By-passing the out-of-order execution pipeline to increase energy-efficiency, Proceedings of the 4th international conference on Computing frontiers, May 07-09, 2007, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|