ACM Home Page
Please provide us with feedback. Feedback
Dynamically allocating processor resources between nearby and distant ILP
Full text PdfPdf (998 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 28th annual international symposium on Computer architecture table of contents
Göteborg, Sweden
Pages: 26 - 37  
Year of Publication: 2001
ISBN:0-7695-1162-7
Also published in ...
Authors
Rajeev Balasubramonian  Department of Computer Science, University of Rochester
Sandhya Dwarkadas  Department of Computer Science, University of Rochester
David H. Albonesi  Department of Electrical and Computer Engineering, University of Rochester
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA : TC on Computer Arhitecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 28,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/379240.379249
What is a DOI?

ABSTRACT

Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements.

In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is nor constrained by in order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get on overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
5
 
6
7
 
8
D. Bailey, et al. The NAS Parallel Benchmarks. Technical Report TR RNR-94-007, NASA Ames Research Center, March 1994.
 
9
M. Dubois and Y. H. Song. Assisted Execution. Technical Report CENG 98-25, EE-Systems, University of Southern California, Oct 1998.
10
 
11
 
12
13
 
14
15
16
 
17
 
18
19
 
20
21
22
23
 
24
 
25
26
27
 
28
29
30
 
31
32
33
 
34
 
35
36

CITED BY  11

Collaborative Colleagues:
Rajeev Balasubramonian: colleagues
Sandhya Dwarkadas: colleagues
David H. Albonesi: colleagues