|
ABSTRACT
Code placement techniques have traditionally improved instruction fetch bandwidth by increasing instruction locality and decreasing the number of taken branches. However, traditional code placement techniques have less benefit in the presence of a trace cache that alters the placement of instructions in the instruction cache. Moreover, as pipelines have become deeper to accommodate increasing clock rates, branch misprediction penalties have become a significant impediment to performance. We evaluate pattern history table partitioning, a feedback directed code placement technique that explicitly places conditional branches so that they are less likely to interfere destructively with one another in branch prediction tables. On SPEC CPU benchmarks running on an Intel Pentium 4, branch mispredictions are reduced by up to 22% and 3.5% on average. This reduction yields a speedup of up to 16.0% and 4.5% on average. By contrast, branch alignment, a previous code placement technique, yields only up to a 4.7% speedup and less than 1% on average.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. Burger and T. M. Austin. The SimpleScalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.
|
 |
3
|
Brad Calder , Dirk Grunwald , Michael Jones , Donald Lindsay , James Martin , Michael Mozer , Benjamin Zorn, Evidence-based static branch prediction using machine learning, ACM Transactions on Programming Languages and Systems (TOPLAS), v.19 n.1, p.188-222, Jan. 1997
[doi> 10.1145/239912.239923]
|
 |
4
|
|
 |
5
|
Po-Yung Chang , Eric Hao , Tse-Yu Yeh , Yale Patt, Branch classification: a new mechanism for improving branch predictor performance, Proceedings of the 27th annual international symposium on Microarchitecture, p.22-31, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192727]
|
| |
6
|
|
| |
7
|
K. Diefendorff. K7 challenges Intel. Microprocessor Report, 12(14), October 1998.
|
| |
8
|
D. J. Hatfield and J. Gerald. Program restructuring for virtual memory. IBM Systems Journal, 10(3):168--192, 1971.
|
| |
9
|
|
 |
10
|
Marius Evers , Po-Yung Chang , Yale N. Patt, Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches, Proceedings of the 23rd annual international symposium on Computer architecture, p.3-11, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
11
|
|
| |
12
|
B. Hayes. Differences in optimizing for the Pentium 4 processor vs. the Pentium III processor. Intel Developer Services, http://www.intel.com/ cd/ ids/developer/ asmo-na/eng/44010.htm.
|
| |
13
|
Intel Corporation. Intel Pentium 4 processor optimization. Technical Report Order Number: 248966, Intel Corporation, 2001.
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Chih-Chieh Lee , I-Cheng K. Chen , Trevor N. Mudge, The bi-mode branch predictor, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.4-13, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
18
|
J. Levon. Oprofile - a system profiler for linux. Technical report, http://oprofile.sourceforge.net/ (Current on September 23, 2004).
|
 |
19
|
|
 |
20
|
|
| |
21
|
S. McFarling. Combining branch predictors. Technical Report TN-36m, Digital Western Research Laboratory, June 1993.
|
 |
22
|
Pierre Michaud , André Seznec , Richard Uhlig, Trading conflict and capacity aliasing in conditional branch predictors, Proceedings of the 24th annual international symposium on Computer architecture, p.292-303, June 01-04, 1997, Denver, Colorado, United States
|
| |
23
|
|
| |
24
|
H. Patil and J. Emer. Combining static and dynamic branch prediction to reduce destructive aliasing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture, January 2000.
|
 |
25
|
|
 |
26
|
Alex Ramírez , Josep-L. Larriba-Pey , Carlos Navarro , Josep Torrellas , Mateo Valero, Software trace cache, Proceedings of the 13th international conference on Supercomputing, p.119-126, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305178]
|
| |
27
|
|
| |
28
|
|
 |
29
|
Eric Sprangle , Robert S. Chappell , Mitch Alsup , Yale N. Patt, The agree predictor: a mechanism for reducing negative branch history interference, Proceedings of the 24th annual international symposium on Computer architecture, p.284-291, June 01-04, 1997, Denver, Colorado, United States
|
| |
30
|
Standard Performance Evaluation Corporation. SPEC CPU 2000, http://www.spec.org/osg/cpu2000, April 2000.
|
 |
31
|
|
 |
32
|
Cliff Young , David S. Johnson , Michael D. Smith , David R. Karger, Near-optimal intraprocedural branch alignment, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.183-193, June 16-18, 1997, Las Vegas, Nevada, United States
|
 |
33
|
|
|