ACM Home Page
Please provide us with feedback. Feedback
A scalable front-end architecture for fast instruction delivery
Full text PdfPdf (176 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 26th annual international symposium on Computer architecture table of contents
Atlanta, Georgia, United States
Pages: 234 - 245  
Year of Publication: 1999
ISBN:0-7695-0170-2
Also published in ...
Authors
Glenn Reinman  Department of Computer Science and Engineering, University of California, San Diego
Todd Austin  Microcomputer Research Labs, Intel Corporation
Brad Calder  Department of Computer Science and Engineering, University of California, San Diego
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 26,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/300979.300999
What is a DOI?

ABSTRACT

In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictions, and taken branches in the instruction stream. To further complicate matters, a VLSI interconnect scaling trend is materializing that further limits the performance of front-end designs in future generation process technologies.To counter these challenges, we present a fetch architecture that permits a faster cycle time than previous designs and scales better with future process technologies. Our design, called the Fetch Target Buffer, is a multi-level fetch block-oriented predictor. We decouple the FTB from the instruction fetch and decode pipelines to afford it the fastest clock possible. Through cycle-based simulation and circuit-level delay analysis, we find that our multi-level FTB design is capable of delivering instructions 25% faster than the best single-level BTB-based pipeline configuration. Moreover, we show that our design scales better to future process technologies than traditional single-level designs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
H. Bakoglu and J. Meindl. Optimal interconnect circuits for VLSI. IEEE Transactions on Computers, 32(5):903-909, May 1985.
 
2
M. Bohr. Interconnect scaling - the real timiter to high-performance ulsi. In Tec& Dig. of the International Electron Devices Meeting, pages 241-244, December 1995,
3
 
4
 
5
D.C. Burger and T. M, Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, June 1997.
6
7
8
9
 
10
J. A. Fisher. Trace scheduling : A technique for global microcode compaction. IEEETrans. Comput., C-30(7):478-490, 1981.
 
11
12
 
13
 
14
D. Lammers. IBM's copper interconnects hit the market. EETimes, 9/3 issue, September 1998.
 
15
D. Lammers. TI's 0.13-micron pmc.~s speeds system-on-a-chip designs. EETimes, 10/23 issue, October I998.
 
16
 
17
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
 
18
19
 
20
S. Patel, D. Friendly, and Y. Patt. Critical issues regarding the trace cache fetch mechanism. CSE-TR-335-97, University of Michigan, May 1997.
 
21
 
22
G. Reinman, B. Calder, and T. Austin. Scalable multi-level instruction fetch prediction. Technical Report UCSD-CS99-613, University of California, San Diego, March 1999.
 
23
24
 
25
 
26
K. Skadron, M. Martonosi, and D. Clark. Speculative updates of local and global branch history: A quantitative analysis. Technical Report TR-589-98, Princeton Dept. of Computer Science, December 1998.
 
27
28
 
29
S. Wilton and N. Jouppi. An enhanced access and cycle time mode1 for on-chip caches. Compaq WRL TR-93-5, July 1994.
 
30
31

CITED BY  28

Collaborative Colleagues:
Glenn Reinman: colleagues
Todd Austin: colleagues
Brad Calder: colleagues