ACM Home Page
Please provide us with feedback. Feedback
Power-efficient instruction delivery through trace reuse
Full text PdfPdf (261 KB)
Source PACT archive
Proceedings of the 15th international conference on Parallel architectures and compilation techniques table of contents
Seattle, Washington, USA
SESSION: Instruction fetch and control flow table of contents
Pages: 192 - 201  
Year of Publication: 2006
ISBN:1-59593-264-X
Authors
Chengmo Yang  University of California, San Diego, La Jolla, CA
Alex Orailoglu  University of California, San Diego, La Jolla, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 36,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1152154.1152185
What is a DOI?

ABSTRACT

As power dissipation inexorably becomes the major bottleneck in system integration and reliability, the front-end instruction delivery path in a traditional out-of-order superscalar processor needs to deliver high application performance in an energy-effective manner. This challenge can be addressed by efficiently reusing the work of fetch and decode performed during preceding loop iterations and resident mostly within the processor itself. As a large percentage of the instructions currently under fetch have previously dispatched copies resident in the Reorder Buffer (ROB), in this paper we develop a mechanism to utilize the ROB as a storage location for previously decoded instructions. Thus instructions can be fed directly from the ROB into the rename and issue stages, enabling the gating off of the fetch and decode logic for large periods of time so as to deliver significant power savings. Power and performance criticality of the ROB requires an efficient reuse identification mechanism; we outline such a cost-efficient Reuse Identification Unit (RIU) which enables effective identification of the matches between the ROB entries and the instructions currently under fetch. Simulation results on both multimedia and SPEC 2000 benchmarks confirm that incorporating the proposed technique on traditional out-of-order superscalar processors results in not only a sight improvement in performance, but also significant savings in the overall system power dissipation, achieved within a limited hardware budget.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
5
6
7
8
 
9
10
11
12
13
14
 
15
J. Montanaro, R. T. Witek, et. al., "A 160-MHZ, 32-B, 0.5-W COMS RISC microprocessor," IEEE Journal of Solid-State Circuits, 31(11):1703--1714, Nov. 1996.
 
16
17
 
18
19
 
20
P. Shivakumar and N. P. Jouppi, "Cacti 3.0: An integrated cache timing, power and area model," Technical report, Western Research Lab, Aug. 2001.


Collaborative Colleagues:
Chengmo Yang: colleagues
Alex Orailoglu: colleagues