|
ABSTRACT
Dynamic superscalar processors execute multiple instructions out-of-order by looking for independent operations within a large window. The number of physical registers within the processor has a direct impact on the size of this window as most in-flight instructions require a new physical register at dispatch. A large multi-ported register file helps improve the instruction-level parallelism (ILP), but may have a detrimental effect on clock speed, especially in future wire-limited technologies. In this paper, we propose a register file organization that reduces register file size and port requirements for a given amount of ILP. We use a two-level register file organization to reduce register file size requirements, and a banked organization to reduce port requirements. We demonstrate empirically that the resulting register file organizations have reduced latency and (in the case of the banked organization) energy requirements for similar instructions per cycle (IPC) performance and improved instructions per second (IPS) performance in comparison to a conventional monolithic register file. The choice of organization is dependent on design goals.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Amirali Baniasadi , Andreas Moshovos, Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.337-347, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360165]
|
 |
2
|
|
| |
3
|
D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
|
| |
4
|
R. Canal, J. M. Parcerisa, and A. Gonzalez. Dynamic Cluster Assignment Mechanisms. In Proceedings of HPCA-6, 2000.
|
 |
5
|
Andrea Capitanio , Nikil Dutt , Alexandru Nicolau, Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, Proceedings of the 25th annual international symposium on Microarchitecture, p.292-300, December 01-04, 1992, Portland, Oregon, United States
|
 |
6
|
José-Lorenzo Cruz , Antonio González , Mateo Valero , Nigel P. Topham, Multiple-banked register file architectures, Proceedings of the 27th annual international symposium on Computer architecture, p.316-325, June 2000, Vancouver, British Columbia, Canada
|
| |
7
|
D. Bailey, et al. The NAS Parallel Benchmarks. Technical Report TR RNR-94-007, NASA Ames Research Center, March 1994.
|
| |
8
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
9
|
|
| |
10
|
L. Gwennap. PA-8500's 1.5M cache aids performance. Microprocessor Report, 11(15), November 17, 1997.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
15
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
| |
16
|
|
| |
17
|
Teresa Monreal , Antonio González , Mateo Valero , José González , Victor Viñals, Delaying physical register allocation through virtual-physical registers, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.186-192, November 16-18, 1999, Haifa, Israel
|
| |
18
|
Mayan Moudgill , Keshav Pingali , Stamatis Vassiliadis, Register renaming and dynamic speculation: an alternative approach, Proceedings of the 26th annual international symposium on Microarchitecture, p.202-213, December 01-03, 1993, Austin, Texas, United States
|
 |
19
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
20
|
Jude A. Rivers , Gary S. Tyson , Edward S. Davidson , Todd M. Austin, On high-bandwidth data cache design for multi-issue processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.46-56, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
21
|
S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. Register Organization for Media Processing. In Proceedings of HPCA-6, Jan 2000.
|
 |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
S. Wilton and N. Jouppi. An Enhanced Access and Cycle Time Model for On-Chip Caches. Technical Report TN-93/5, Compaq Western Research Lab, 1993.
|
| |
29
|
|
| |
30
|
|
 |
31
|
Javier Zalamea , Josep Llosa , Eduard Ayguadé , Mateo Valero, Two-level hierarchical register file organization for VLIW processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.137-146, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360143]
|
CITED BY 48
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Oguz Ergin , Deniz Balkan , Kanad Ghose , Dmitry Ponomarev, Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.304-315, December 04-08, 2004, Portland, Oregon
|
|
|
Eric Tune , Rakesh Kumar , Dean M. Tullsen , Brad Calder, Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.183-194, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, Selective writeback: exploiting transient values for energy-efficiency and performance, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
|
|
|
Anahita Shayesteh , Glenn Reinman , Norm Jouppi , Tim Sherwood , Suleyman Sair, Improving the performance and power efficiency of shared helpers in CMPs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
Francisco J. Mesa-Martínez , Michael C. Huang , Jose Renau, SEED: scalable, efficient enforcement of dependences, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
David W. Oehmke , Nathan L. Binkert , Trevor Mudge , Steven K. Reinhardt, How to Fake 1000 Registers, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.7-18, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|