|
ABSTRACT
As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements will also increase. It will become necessary to fetch multiple basic blocks per cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. We propose supplementing the conventional instruction cache with a trace cache. This structure caches traces of the dynamic instruction stream, so instructions that are otherwise noncontiguous appear contiguous. For the Instruction Benchmark Suite (IBS) and SPEC92 integer benchmarks, a 4 kilobyte trace cache improves performance on average by 28% over conventional sequential fetching. Further, it is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Thomas M. Conte , Kishore N. Menezes , Patrick M. Mills , Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proceedings of the 22nd annual international symposium on Computer architecture, p.333-344, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
J. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 21(7):6-22, Jan 1984.
|
| |
9
|
J. Losq. Generalized history table for branch prediction. IBM Technical Disclosure Bulletin, 25(1 ):99-101, June 1982.
|
| |
10
|
S. W. Melvin , M. C. Shebanow , Y. N. Patt, Hardware support for large atomic units in dynamically scheduled machines, Proceedings of the 21st annual workshop on Microprogramming and microarchitecture, p.60-63, November 28-December 02, 1988, San Diego, California, United States
|
 |
11
|
Shien-Tai Pan , Kimming So , Joseph T. Rahmeh, Improving the accuracy of dynamic branch prediction using branch correlation, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.76-84, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
12
|
E. Rotenberg, S. Bennett, and J. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. Tech Report 1310, CS Dept., Univ. ofWisc. - Madison, 1996.
|
| |
13
|
|
 |
14
|
Richard Uhlig , David Nagle , Trevor Mudge , Stuart Sechrest , Joel Emer, Instruction fetching: coping with code bloat, Proceedings of the 22nd annual international symposium on Computer architecture, p.345-356, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
CITED BY 110
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel Holmes Friendly , Sanjay Jeram Patel , Yale N. Patt, Alternative fetch and issue policies for the trace cache fetch mechanism, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.24-33, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
Jared Stark , Paul Racunas , Yale N. Patt, Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.34-43, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
Quinn Jacobson , Eric Rotenberg , James E. Smith, Path-based next trace prediction, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.14-23, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
Jude A. Rivers , Gary S. Tyson , Edward S. Davidson , Todd M. Austin, On high-bandwidth data cache design for multi-issue processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.46-56, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Bekerman , Adi Yoaz , Freddy Gabbay , Stephan Jourdan , Maxim Kalaev , Ronny Ronen, Early load address resolution via register tracking, ACM SIGARCH Computer Architecture News, v.28 n.2, p.306-315, May 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yuan Chou , Pazhani Pillai , Herman Schmit , John Paul Shen, PipeRench implementation of the instruction path coprocessor, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.147-158, December 2000, Monterey, California, United States
|
|
|
Glenn Reinman , Brad Calder , Dean Tullsen , Gary Tyson , Todd Austin, Classifying load and store instructions for memory renaming, Proceedings of the 13th international conference on Supercomputing, p.399-407, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sanjay J. Patel , Tony Tung , Satarupa Bose , Matthew M. Crum, Increasing the size of atomic instruction blocks using control flow assertions, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.303-313, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
Baruch Solomon , Avi Mendelson , Doron Orenstein , Yoav Almog , Ronny Ronen, Micro-operation cache: a power aware frontend for the variable instruction length ISA, Proceedings of the 2001 international symposium on Low power electronics and design, p.4-9, August 2001, Huntington Beach, California, United States
|
|
|
|
|
|
|
|
|
Francisca Quintana , Jesus Corbal , Roger Espasa , Mateo Valero, Adding a vector unit to a superscalar processor, Proceedings of the 13th international conference on Supercomputing, p.1-10, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Howard Chen , Wei-Chung Hsu , Jiwei Lu , Pen-Chung Yew , Dong-Yuan Chen, Dynamic trace selection using performance monitoring hardware sampling, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Roni Rosner , Micha Moffie , Yiannakis Sazeides , Ronny Ronen, Selecting long atomic traces for high coverage, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
Daniel Chaver , Miguel A. Rojas , Luis Pinuel , Manuel Prieto , Francisco Tirado , Michael C. Huang, Energy-aware fetch mechanism: trace cache and BTB customization, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Giuseppe Desoli , Nikolay Mateev , Evelyn Duesterwald , Paolo Faraboschi , Joseph A. Fisher, DELI: a new run-time control point, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques, IEEE Transactions on Computers, v.48 n.11, p.1260-1281, November 1999
|
|
|
Gregory T. Sullivan , Derek L. Bruening , Iris Baron , Timothy Garnett , Saman Amarasinghe, Dynamic native optimization of interpreters, Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators, p.50-57, June 12-12, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yale N. Patt , Sanjay J. Patel , Marius Evers , Daniel H. Friendly , Jared Stark, One Billion Transistors, One Uniprocessor, One Chip, Computer, v.30 n.9, p.51-57, September 1997
|
|
|
|
|
|
Alex Ramírez , Josep-L. Larriba-Pey , Carlos Navarro , Josep Torrellas , Mateo Valero, Software trace cache, Proceedings of the 13th international conference on Supercomputing, p.119-126, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Juan C. Moure , Domingo Benítez , Dolores I. Rexachs , Emilio Luque, Wide and efficient trace prediction using the local trace predictor, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
Oliverio J. Santana , Ayose Falcón , Alex Ramirez , Mateo Valero, Branch predictor guided instruction decoding, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|