|
ABSTRACT
Instruction issue logic is a critical component in modern high-performance out-of-order processors. The ever increasing latencies found in modern processors, mostly associated with memory accesses and longer pipelines, can be attenuated using large issue queues. Conventional designs rely on atomic wakeup-select cycles to ensure compact scheduling. These designs must aggressively utilize broadcasting, compaction, and heavily-ported structures that scale poorly in terms of both power consumption and access tim.To provide high scheduling flexibility and large instruction capacity without incurring prohibitive latency and energy overhead, we propose a novel scheme that uses an out-of-order, broadcast-free instruction wakeup block feeding an in-order scheduler. Multi-banked, index-based structures are used throughout this scheme to provide a high degree of scalability while achieving efficient dependence tracking, resulting in good overall performance and energy efficiency. We call this design "Scalable, Efficient Enforcement of Dependences (SEED)". We present a detailed design and analysis of SEED through an extensive evaluation. Compared to a conventional issue queue design, which is assumed favorably to scale in size without any impact on cycle time, the performance degradation of our design is 3% for both INT and FP suites of SPEC CPU2000. For such a small performance cost, SEED enjoys a 19% reduction in total chip power consumption for a 32-entry configuration. We also synthesize SEED and a conventional issue logic with 90nm standard cell logic. Synthesis results show that SEED can cycle twice the speed of a conventional issue logic of equivalent size. Cycling at the same frequency, SEED consumes ten times less dynamic power and five times less static power while achieving substantial area savings.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
Compaq Computer Corporation. Alpha 21264/EV6 Microprocessor Hardware Reference Manual, Sept. 2000. Order number: DS-0027B-TE.
|
 |
7
|
|
| |
8
|
J. A. Farrell and T. C. Fischer. Issue logic for a 600-MHz Out-of-Order Execution Microprocessor. IEEE Journal of Solid-State Circuits, 33(5):707--712, May 1998.
|
 |
9
|
|
 |
10
|
Alok Garg , Fernando Castro , Michael Huang , Daniel Chaver , Luis Piñuel , Manuel Prieto, Substituting associative load queue with simple hash tables in out-of-order microprocessors, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
[doi> 10.1145/1165573.1165637]
|
 |
11
|
|
 |
12
|
|
| |
13
|
Masahiro Goshima , Kengo Nishino , Toshiaki Kitamura , Yasuhiko Nakashima , Shinji Tomita , Shin-ichiro Mori, A high-speed dynamic instruction scheduling scheme for superscalar processors, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
 |
14
|
Michael K. Gowan , Larry L. Biro , Daniel B. Jackson, Power considerations in the design of the Alpha 21264 microprocessor, Proceedings of the 35th annual conference on Design automation, p.726-731, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277226]
|
 |
15
|
|
| |
16
|
R. Huang, A. Garg, and M. Huang. Software-Hardware Cooperative Memory Disambiguation. In International Symposium on High-Performance Computer Architecture, pages 248--257, Feb. 2006.
|
 |
17
|
|
 |
18
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, p.59, May 25-29, 2002, Anchorage, Alaska
|
| |
19
|
|
 |
20
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
21
|
|
 |
22
|
|
| |
23
|
J. Renau et al. SESC simulator, January 2005. http://sesc.sourceforge.net.
|
| |
24
|
|
| |
25
|
P. Shivakumar and N. Jouppi. CACTI 3.0: An integrated cache timing, power and area model. Technical Report 2001/2, Compaq Computer Corporation, August 2001.
|
| |
26
|
Synopsys Inc. Design Compiler Product Information, 2005. http://www.synopsys.com.
|
| |
27
|
Synplicity Inc. SynplifyPro Product Information, 2005. http://www.synplicity.com.
|
| |
28
|
J. Tendler, J. Dodson, J. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal of Research and Development, 46(1):5--25, Jan. 2002.
|
 |
29
|
|
| |
30
|
|
| |
31
|
S. Weiss and J. Smith. Instruction Issue Logic in Pipelined Supercomputers. IEEE Transactions on Computers, 33(11):1013--1022, Nov. 1984.
|
| |
32
|
|
 |
33
|
|
|