| A comparison of full and partial predicated execution support for ILP processors |
| Full text |
Pdf
(1.48 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 22nd annual international symposium on Computer architecture
table of contents
S. Margherita Ligure, Italy
Pages: 138 - 150
Year of Publication: 1995
ISBN:0-89791-698-0
Also published in ...
|
|
Authors
|
|
Scott A. Mahlke
|
Hewlett Packard Laboratories, Palo Alto, CA and Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
|
|
Richard E. Hank
|
Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
|
|
James E. McCormick
|
Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
|
|
David I. August
|
Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
|
|
Wen-Mei W. Hwu
|
Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 54, Citation Count: 24
|
|
|
ABSTRACT
One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs involved in the design of an instruction set to support predicated execution can be difficult. On one end of the design spectrum, architectural support for full predicated execution requires increasing the number of source operands for all instructions. Full predicate support provides for the most flexibility and the largest potential performance improvements. On the other end, partial predicated execution support, such as conditional moves, requires very little change to existing architectures. This paper presents a preliminary study to qualitatively and quantitatively address the benefit of full and partial predicated execution support. With our current compiler technology, we show that the compiler can use both partial and full predication to achieve speedup in large control-intensive programs. Some details of the code generation techniques are shown to provide insight into the benefit of going from partial to full predication. Preliminary experimental results are very encouraging: partial predication provides an average of 33% performance improvement for an 8-issue processor with no predicate support while full predication provides an additional 30% improvement.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
J. Lee and A. J. Smith, "Branch prediction strategies and branch target buffer design," IEEE Computer, pp. 6-22, January 1984.
|
 |
3
|
|
 |
4
|
M. D. Smith , M. Johnson , M. A. Horowitz, Limits on multiple instruction issue, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.290-302, April 03-06, 1989, Boston, Massachusetts, United States
|
 |
5
|
|
 |
6
|
Michael Butler , Tse-Yu Yeh , Yale Patt , Mitch Alsup , Hunter Scales , Michael Shebanow, Single instruction stream parallelism is greater than two, Proceedings of the 18th annual international symposium on Computer architecture, p.276-286, May 27-30, 1991, Toronto, Ontario, Canada
|
 |
7
|
|
| |
8
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
 |
9
|
J. R. Allen , Ken Kennedy , Carrie Porterfield , Joe Warren, Conversion of control dependence to data dependence, Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, p.177-189, January 24-26, 1983, Austin, Texas
[doi> 10.1145/567067.567085]
|
| |
10
|
J. C. Park and M. S. Schlansker, "On predicated execution," Tech. Rep. HPL-91-58, Hewlett Packard Laboratories, Palo Alto, CA, May 1991.
|
 |
11
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
 |
12
|
Scott A. Mahlke , Richard E. Hank , Roger A. Bringmann , John C. Gyllenhaal , David M. Gallagher , Wen-mei W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proceedings of the 27th annual international symposium on Microarchitecture, p.217-227, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192755]
|
| |
13
|
|
 |
14
|
|
| |
15
|
V. Kathail, M. S. Schlansker, and B. R. Rau, "HPL playdoh architecture specification: Version 1.0," Tech. Rep. HPL- 93-80, Hewlett-Packard Laboratories, Palo Alto, CA 94303, February 1994.
|
 |
16
|
Michael Schlansker , Vinod Kathail , Sadun Anik, Height reduction of control recurrences for ILP processors, Proceedings of the 27th annual international symposium on Microarchitecture, p.40-51, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192729]
|
| |
17
|
Hewlett-Packard Company, Cupertino, CA, PA-RISC 1.1 Architecture and Instruction Set Reference Manual, 1990.
|
| |
18
|
D. S. Blickstein et al., "The GEM optimizing compiler system," Digital Technical Journal, vol. 4, pp. 121-136, 1992.
|
| |
19
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
| |
20
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
CITED BY 24
|
|
David I. August , John W. Sias , Jean-Michel Puiatti , Scott A. Mahlke , Daniel A. Connors , Kevin M. Crozier , Wen-mei W. Hwu, The program decision logic approach to predicated execution, ACM SIGARCH Computer Architecture News, v.27 n.2, p.208-219, May 1999
|
|
|
|
|
|
David I. August , Daniel A. Connors , Scott A. Mahlke , John W. Sias , Kevin M. Crozier , Ben-Chung Cheng , Patrick R. Eaton , Qudus B. Olaniran , Wen-mei W. Hwu, Integrated predicated and speculative execution in the IMPACT EPIC architecture, ACM SIGARCH Computer Architecture News, v.26 n.3, p.227-237, June 1998
|
|
|
|
|
|
David I. August , Wen-mei W. Hwu , Scott A. Mahlke, A framework for balancing control flow and predication, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.92-103, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
Lieven Eeckhout , Tom Vander Aa , Bart Goeman , Hans Vandierendonck , Rudy Lauwereins , Koen De Bosschere, Application domains for fixed-length block structured architectures, Australian Computer Science Communications, v.23 n.4, p.35-44, January 2001
|
|
|
Tom Vander Aa , Lieven Eeckhout , Bart Goeman , Hans Vandierendonck , Tanja Van Achteren , Rudy Lauwereins , Koen De Bosschere, Optimizing a 3D image reconstruction algorithm: investigating the interaction between the high-level implementation, the compiler and the architecture, Australian Computer Science Communications, v.24 n.3, p.119-126, January-February 2002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jerry Huck , Dale Morris , Jonathan Ross , Allan Knies , Hans Mulder , Rumi Zahir, Introducing the IA-64 Architecture, IEEE Micro, v.20 n.5, p.12-23, September 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|