|
ABSTRACT
Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP implies increasing the sizes of structures such as the register file, issue queue, and reorder buffer. Simultaneously, cycle time constraints limit the sizes of these structures, resulting in conflicting design requirements.
In this paper, we present a novel microarchitecture designed to overcome the limitations of a register file size dictated by cycle time constraints. Available registers are dynamically allocated between the primary program thread and a future thread. The future thread executes instructions when the primary thread is limited by resource availability. The future thread is nor constrained by in order commit requirements. It is therefore able to examine a much larger instruction window and jump far ahead to execute ready instructions. Results are communicated back to the primary thread by warming up the register file, instruction cache, data cache, and instruction reuse buffer, and by resolving branch mispredicts early. The proposed microarchitecture is able to get on overall speedup of 1.17 over the base processor for our benchmark set, with speedups of up to 1.64.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360153]
|
| |
3
|
|
| |
4
|
D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
|
 |
5
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
6
|
|
 |
7
|
José-Lorenzo Cruz , Antonio González , Mateo Valero , Nigel P. Topham, Multiple-banked register file architectures, Proceedings of the 27th annual international symposium on Computer architecture, p.316-325, June 2000, Vancouver, British Columbia, Canada
|
| |
8
|
D. Bailey, et al. The NAS Parallel Benchmarks. Technical Report TR RNR-94-007, NASA Ames Research Center, March 1994.
|
| |
9
|
M. Dubois and Y. H. Song. Assisted Execution. Technical Report CENG 98-25, EE-Systems, University of Southern California, Oct 1998.
|
 |
10
|
|
| |
11
|
Alexandre Farcy , Olivier Temam , Roger Espasa , Toni Juan, Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.59-68, November 1998, Dallas, Texas, United States
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
Teresa Monreal , Antonio González , Mateo Valero , José González , Victor Viñals, Delaying physical register allocation through virtual-physical registers, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.186-192, November 16-18, 1999, Haifa, Israel
|
| |
18
|
Mayan Moudgill , Keshav Pingali , Stamatis Vassiliadis, Register renaming and dynamic speculation: an alternative approach, Proceedings of the 26th annual international symposium on Microarchitecture, p.202-213, December 01-03, 1993, Austin, Texas, United States
|
 |
19
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
20
|
|
 |
21
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
26
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
27
|
|
| |
28
|
|
 |
29
|
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
|
 |
36
|
|
|