| The block-based trace cache |
| Full text |
Pdf
(181 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 26th annual international symposium on Computer architecture
table of contents
Atlanta, Georgia, United States
Pages: 196 - 207
Year of Publication: 1999
ISBN:0-7695-0170-2
Also published in ...
|
|
Authors
|
|
Bryan Black
|
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
|
|
Bohuslav Rychlik
|
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
|
|
John Paul Shen
|
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 28, Citation Count: 15
|
|
|
ABSTRACT
The trace cache is a recently proposed solution to achieving high instruction fetch bandwidth by buffering and reusing dynamic instruction traces. This work presents a new block-based trace cache implementation that can achieve higher IPC performance with more efficient storage of traces. Instead of explicitly storing instructions of a trace, pointers to blocks constituting a trace are stored in a much smaller trace table. The block-based trace cache renames fetch addresses at the basic block level and stores aligned blocks in a block cache. Traces are constructed by accessing the replicated block cache using block pointers from the trace table. Performance potential of the block-based trace cache is quantified and compared with perfect branch prediction and perfect fetch schemes. Comparing to the conventional trace cache, the block-based design can achieve higher IPC, with less impact on cycle time.Results: Using the SPECint95 benchmarks, a 16-wide realistic design of a block-based trace cache can improve performance 75% over a baseline design and to within 7% of a baseline design with perfect branch prediction. With idealized trace prediction, it is shown the block-based trace cache with an 1K-entry block cache achieves the same performance of the conventional trace cache with 32K entries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Thomas M. Conte , Kishore N. Menezes , Patrick M. Mills , Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proceedings of the 22nd annual international symposium on Computer architecture, p.333-344, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
4
|
|
| |
5
|
Daniel Holmes Friendly , Sanjay Jeram Patel , Yale N. Patt, Alternative fetch and issue policies for the trace cache fetch mechanism, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.24-33, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
6
|
Eric Hao , Po-Yung Chang , Marius Evers , Yale N. Patt, Increasing the instruction fetch rate via block-structured instruction set architectures, Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, p.191-200, December 02-04, 1996, Paris, France
|
| |
7
|
IBM Microelectronics Division, PowerPC 604 RISC Microprocessor User's Manual, 1994
|
| |
8
|
Quinn Jacobson , Eric Rotenberg , James E. Smith, Path-based next trace prediction, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.14-23, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
9
|
|
| |
10
|
S. McFarling, "Combining Branch Predictors." Technical Report TN-36, Digital Equipment Corp., June 1993
|
| |
11
|
|
| |
12
|
|
 |
13
|
Shien-Tai Pan , Kimming So , Joseph T. Rahmeh, Improving the accuracy of dynamic branch prediction using branch correlation, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.76-84, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
14
|
|
| |
15
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
16
|
André Seznec , Stéphan Jourdan , Pascal Sainrat , Pierre Michaud, Multiple-block ahead branch predictors, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.116-127, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
CITED BY 15
|
|
|
|
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
Yuan Chou , Pazhani Pillai , Herman Schmit , John Paul Shen, PipeRench implementation of the instruction path coprocessor, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.147-158, December 2000, Monterey, California, United States
|
|
|
|
|
|
Baruch Solomon , Avi Mendelson , Doron Orenstein , Yoav Almog , Ronny Ronen, Micro-operation cache: a power aware frontend for the variable instruction length ISA, Proceedings of the 2001 international symposium on Low power electronics and design, p.4-9, August 2001, Huntington Beach, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Roni Rosner , Micha Moffie , Yiannakis Sazeides , Ronny Ronen, Selecting long atomic traces for high coverage, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|