ACM Home Page
Please provide us with feedback. Feedback
Reducing the complexity of the register file in dynamic superscalar processors
Full text Publisher SitePublisher Site PdfPdf (1.34 MB)
Source International Symposium on Microarchitecture archive
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture table of contents
Austin, Texas
SESSION: Superscalar architectures table of contents
Pages: 237 - 248  
Year of Publication: 2001
ISBN ~ ISSN:1072-4451 , 0-7695-1369-7
Authors
Rajeev Balasubramonian  University of Rochester
Sandhya Dwarkadas  University of Rochester
David H. Albonesi  University of Rochester
Sponsors
: IEEE TC-MARCH
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 42,   Citation Count: 48
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Dynamic superscalar processors execute multiple instructions out-of-order by looking for independent operations within a large window. The number of physical registers within the processor has a direct impact on the size of this window as most in-flight instructions require a new physical register at dispatch. A large multi-ported register file helps improve the instruction-level parallelism (ILP), but may have a detrimental effect on clock speed, especially in future wire-limited technologies. In this paper, we propose a register file organization that reduces register file size and port requirements for a given amount of ILP. We use a two-level register file organization to reduce register file size requirements, and a banked organization to reduce port requirements. We demonstrate empirically that the resulting register file organizations have reduced latency and (in the case of the banked organization) energy requirements for similar instructions per cycle (IPC) performance and improved instructions per second (IPS) performance in comparison to a conventional monolithic register file. The choice of organization is dependent on design goals.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
 
4
R. Canal, J. M. Parcerisa, and A. Gonzalez. Dynamic Cluster Assignment Mechanisms. In Proceedings of HPCA-6, 2000.
5
6
 
7
D. Bailey, et al. The NAS Parallel Benchmarks. Technical Report TR RNR-94-007, NASA Ames Research Center, March 1994.
 
8
 
9
 
10
L. Gwennap. PA-8500's 1.5M cache aids performance. Microprocessor Report, 11(15), November 17, 1997.
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
19
 
20
 
21
S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. Register Organization for Media Processing. In Proceedings of HPCA-6, Jan 2000.
22
 
23
24
25
26
 
27
 
28
S. Wilton and N. Jouppi. An Enhanced Access and Cycle Time Model for On-Chip Caches. Technical Report TN-93/5, Compaq Western Research Lab, 1993.
 
29
 
30
31

CITED BY  48
Collaborative Colleagues:
Rajeev Balasubramonian: colleagues
Sandhya Dwarkadas: colleagues
David H. Albonesi: colleagues