|
ABSTRACT
Modern dynamically scheduled processors use branch prediction hardware to speculatively fetch and execute most likely executed paths in a program. Complex branch predictors have been proposed which attempt to identify these paths accurately such that the hardware can benefit from out-of-order (OOO) execution. Recent studies have shown that inspite of such complex prediction schemes, there still exist many frequently executed branches which are difficult to predict. Predicated execution has been proposed as an alternative technique to eliminate some of these branches in various forms ranging from a restrictive support to a full-blown support. We call the restrictive form of predicated execution as guarded execution.In this paper, we propose a new algorithm which uses profiling and selectively performs if-conversion for architectures with guarded execution support. Branch profiling is used to gather the taken, non-taken and misprediction counts for every branch. This combined with block profiling is used to select paths which suffer from heavy mispredictions and are profitable to if-convert. Effects of three different selection criterias, namely size-based, predictability-based and profiled-based, on net cycle improvements, branch mispredictions and mis-speculated instructions are then studied. We also propose new mechanisms to convert unsafe instructions to safe form to enhance the applicability of the technique. Finally, we explain numerous adjustments that were made to the selection criterias to better reflect the OOO processor behavior.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J.C. Dehnert, P.Y.T. Hsu and J. P. Bratt, "Overlapped loop support in the Cydra5," Proceedings of the 16th Annual International Symposium on Computer Microarchitecture, 1989
|
| |
2
|
Po-Yung Chang , Eric Hao , Yale N. Patt , Pohua P. Chang, Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.99-108, June 27-29, 1995, Limassol, Cyprus
|
 |
3
|
|
 |
4
|
Scott A. Mahlke , Richard E. Hank , Roger A. Bringmann , John C. Gyllenhaal , David M. Gallagher , Wen-mei W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proceedings of the 27th annual international symposium on Microarchitecture, p.217-227, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192755]
|
 |
5
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
| |
6
|
A. Nicolau and J.A. Fisher, "Measuring the available parallelism for very long instruction word architectures," IEEE- TC, C-33, p 1088-1098 September, 1984
|
 |
7
|
|
 |
8
|
Po-Yung Chang , Eric Hao , Tse-Yu Yeh , Yale Patt, Branch classification: a new mechanism for improving branch predictor performance, Proceedings of the 27th annual international symposium on Microarchitecture, p.22-31, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192727]
|
| |
9
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
D. Burger, T.M. Austin and S. Bermett, -"Evaluating Future Microprocessors: The Simplescalar Tools Set," Technical Report TR#1308, University of Wisconsin, 1996
|
| |
14
|
S. McFarling, "Combining Branch Predictors," WRL Technical Note TR-36, Digital Equipment Corporation, June 1993.
|
| |
15
|
"WHIRL Intermediate Language Specification," Technical Report, Silicon Graphics Inc., Nov 1999.
|
| |
16
|
|
 |
17
|
John Ruttenberg , G. R. Gao , A. Stoutchinin , W. Lichtenstein, Software pipelining showdown: optimal vs. heuristic methods in a production compiler, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.1-11, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
18
|
|
| |
19
|
|
CITED BY 4
|
|
|
|
|
|
|
|
|
|
|
Hyesoon Kim , Onur Mutlu , Jared Stark , Yale N. Patt, Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.43-54, November 12-16, 2005, Barcelona, Spain
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|