| Power-efficient instruction delivery through trace reuse |
| Full text |
Pdf
(261 KB)
|
| Source
|
PACT
archive
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
table of contents
Seattle, Washington, USA
SESSION: Instruction fetch and control flow
table of contents
Pages: 192 - 201
Year of Publication: 2006
ISBN:1-59593-264-X
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 35, Citation Count: 1
|
|
|
ABSTRACT
As power dissipation inexorably becomes the major bottleneck in system integration and reliability, the front-end instruction delivery path in a traditional out-of-order superscalar processor needs to deliver high application performance in an energy-effective manner. This challenge can be addressed by efficiently reusing the work of fetch and decode performed during preceding loop iterations and resident mostly within the processor itself. As a large percentage of the instructions currently under fetch have previously dispatched copies resident in the Reorder Buffer (ROB), in this paper we develop a mechanism to utilize the ROB as a storage location for previously decoded instructions. Thus instructions can be fed directly from the ROB into the rename and issue stages, enabling the gating off of the fetch and decode logic for large periods of time so as to deliver significant power savings. Power and performance criticality of the ROB requires an efficient reuse identification mechanism; we outline such a cost-efficient Reuse Identification Unit (RIU) which enables effective identification of the matches between the ROB entries and the instructions currently under fetch. Simulation results on both multimedia and SPEC 2000 benchmarks confirm that incorporating the proposed technique on traditional out-of-order superscalar processors results in not only a sight improvement in performance, but also significant savings in the overall system power dissipation, achieved within a limited hardware budget.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
David M. Brooks , Pradip Bose , Stanley E. Schuster , Hans Jacobson , Prabhakar N. Kudva , Alper Buyuktosunoglu , John-David Wellman , Victor Zyuban , Manish Gupta , Peter W. Cook, Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors, IEEE Micro, v.20 n.6, p.26-44, November 2000
[doi> 10.1109/40.888701]
|
| |
2
|
|
 |
3
|
|
 |
4
|
Alper Buyuktosunoglu , David Albonesi , Stanley Schuster , David Brooks , Pradip Bose , Peter Cook, A circuit level implementation of an adaptive issue queue for power-aware microprocessors, Proceedings of the 11th Great Lakes symposium on VLSI, p.73-78, March 2001, West Lafayette, Indiana, United States
[doi> 10.1145/368122.368807]
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
Johnson Kin , Munish Gupta , William H. Mangione-Smith, The filter cache: an energy efficient memory structure, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.184-193, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
10
|
Lea Hwang Lee , Bill Moyer , John Arends, Instruction fetch energy reduction using loop caches for embedded applications with small tight loops, Proceedings of the 1999 international symposium on Low power electronics and design, p.267-269, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313944]
|
 |
11
|
|
 |
12
|
Baruch Solomon , Avi Mendelson , Doron Orenstein , Yoav Almog , Ronny Ronen, Micro-operation cache: a power aware frontend for the variable instruction length ISA, Proceedings of the 2001 international symposium on Low power electronics and design, p.4-9, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383085]
|
 |
13
|
Sriram Vajapeyam , P. J. Joseph , Tulika Mitra, Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs, Proceedings of the 26th annual international symposium on Computer architecture, p.16-27, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
14
|
Kanad Ghose , Milind B. Kamble, Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation, Proceedings of the 1999 international symposium on Low power electronics and design, p.70-75, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313860]
|
| |
15
|
J. Montanaro, R. T. Witek, et. al., "A 160-MHZ, 32-B, 0.5-W COMS RISC microprocessor," IEEE Journal of Solid-State Circuits, 31(11):1703--1714, Nov. 1996.
|
| |
16
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
P. Shivakumar and N. P. Jouppi, "Cacti 3.0: An integrated cache timing, power and area model," Technical report, Western Research Lab, Aug. 2001.
|
CITED BY
|
|
Frederico Pratas , Georgi Gaydadjiev , Mladen Berekovic , Leonel Sousa , Stefanos Kaxiras, Low power microarchitecture with instruction reuse, Proceedings of the 2008 conference on Computing frontiers, May 05-07, 2008, Ischia, Italy
|
|