| Improving trace cache effectiveness with branch promotion and trace packing |
| Full text |
Pdf
(1.11 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 25th annual international symposium on Computer architecture
table of contents
Barcelona, Spain
Pages: 262 - 271
Year of Publication: 1998
ISBN:0-8186-8491-7
Also published in ...
|
|
Authors
|
|
Sanjay Jeram Patel
|
Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan
|
|
Marius Evers
|
Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan
|
|
Yale N. Patt
|
Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 31, Citation Count: 20
|
|
|
ABSTRACT
The increasing widths of superscalar processors are placing greater demands upon the fetch mechanism. The trace cache meets these demands by placing logically contiguous instructions in physically contiguous storage. As a result, the trace cache delivers instructions at a high rate by supplying multiple fetch blocks each cycle.In this paper, we examine two techniques to improve the number of instructions delivered each cycle by the trace cache. The first technique, branch promotion, dynamically converts strongly biased branches into branches with static predictions. Because these promoted branches require no dynamic prediction, the branch predictor suffers less from the negative effects of interference.Branch promotion unlocks the potential of the second technique: trace packing. With trace packing, trace segments are packed with as many instructions as will fit, without regard to naturally occurring fetch block boundaries. With both techniques, the effective fetch rate of the trace cache jumps up 17% over a trace cache which implements neither.On a machine where the execution engine has a very aggressive memory disambiguator, the performance of a machine using branch promotion and trace packing is on average 11% higher than a machine using neither technique.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Burger, T. Austin, and S. Bennett. Evaluating future microprocessors: The simplescalar tool set. Technical Report 1308, University of Wisconsin - Madison Technical Report, July 1996.
|
| |
2
|
|
 |
3
|
Po-Yung Chang , Eric Hao , Tse-Yu Yeh , Yale Patt, Branch classification: a new mechanism for improving branch predictor performance, Proceedings of the 27th annual international symposium on Microarchitecture, p.22-31, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192727]
|
 |
4
|
|
| |
5
|
Daniel Holmes Friendly , Sanjay Jeram Patel , Yale N. Patt, Alternative fetch and issue policies for the trace cache fetch mechanism, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.24-33, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
6
|
|
| |
7
|
S. W. Melvin , M. C. Shebanow , Y. N. Patt, Hardware support for large atomic units in dynamically scheduled machines, Proceedings of the 21st annual workshop on Microprogramming and microarchitecture, p.60-63, November 28-December 02, 1988, San Diego, California, United States
|
 |
8
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
| |
9
|
S.J. Patel, D. H. Friendly, and Y. N. Patt. Critical issues regarding the trace cache fetch mechanism. Technical Report CSE-TR-335-97, University of Michigan Technical Report, May 1997.
|
| |
10
|
A. Peleg and U. Weiser. Dynamic flow instruction cache memory organized around trace segments independant of virtual address line. U.S. Patent Number 5,381,533, 1994.
|
| |
11
|
|
| |
12
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
13
|
|
| |
14
|
Jared Stark , Paul Racunas , Yale N. Patt, Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.34-43, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
15
|
|
 |
16
|
|
CITED BY 20
|
|
|
|
|
Sanjay J. Patel , Tony Tung , Satarupa Bose , Matthew M. Crum, Increasing the size of atomic instruction blocks using control flow assertions, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.303-313, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Roni Rosner , Micha Moffie , Yiannakis Sazeides , Ronny Ronen, Selecting long atomic traces for high coverage, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Yoav Almog , Roni Rosner , Naftali Schwartz , Ari Schmorak, Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.137, March 20-24, 2004, Palo Alto, California
|
|
|
Alex Ramírez , Josep-L. Larriba-Pey , Carlos Navarro , Josep Torrellas , Mateo Valero, Software trace cache, Proceedings of the 13th international conference on Supercomputing, p.119-126, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|