|
ABSTRACT
Despite continuous improvement in branch prediction algorithms, branch misprediction remains a major limitation on microprocessor performance. As pipelines are widened or stretched deeper, branch prediction will become even more crucial. This paper taps into a currently wasted resource, wrong-path execution, to help improve branch prediction. Due to control independence, often the outcomes of branches that are executed along the wrong-path match the outcomes on the correct-path. Current branch prediction methods rely on correlation between branches on the correct path, therefore leaving potentially useful wrong-path branch information unexploited. We present in this paper a new, very simple, and very effective method that extends branch prediction to allow the recycling of wrong-path branch outcomes at the fetch stage. Simulations of deeply pipelined processors using a selected set of SpecInt 2000 and other benchmarks, with more than 5 branch mispredictions per thousand micro-operations, show that branch misprediction rate can be reduced by up to 30%. Depending on the pipeline depth, the corresponding average performance improvement varies from 5% to 20%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Robert S. Chappell , Francis Tseng , Adi Yoaz , Yale N. Patt, Difficult-path branch prediction using subordinate microthreads, Proceedings of the 29th annual international symposium on Computer architecture, p.307, May 25-29, 2002, Anchorage, Alaska
|
| |
4
|
R. S. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. N. Patt. Simultaneous Subordinate Microthreading (SSMT). In Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001.
|
| |
5
|
|
 |
6
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The Microarchitecture of the Pentium® 4 Processor. In Intel Technical Journal, Q1 2001 Issue.
|
| |
11
|
|
 |
12
|
|
| |
13
|
S. McFarling. Combining Branch Predictors. Technical Report TN-36, Digital Western Research Laboratory, June 1993.
|
| |
14
|
O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. In Proceedings of the 8th International Symposium on High Performance Computer Architecture, 2002.
|
 |
15
|
|
 |
16
|
André Seznec , Stephen Felix , Venkata Krishnan , Yiannakis Sazeides, Design tradeoffs for the Alpha EV8 conditional branch predictor, Proceedings of the 29th annual international symposium on Computer architecture, p.295, May 25-29, 2002, Anchorage, Alaska
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
CITED BY 2
|
|
|
|
|
Arun Kejariwal , Xinmin Tian , Milind Girkar , Wei Li , Sergey Kozhukhov , Utpal Banerjee , Alexander Nicolau , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, Tight analysis of the performance potential of thread speculation using spec CPU 2006, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
|
|