| VLIW compilation techniques in a superscalar environment |
| Full text |
Pdf
(1.30 MB)
|
| Source
|
Conference on Programming Language Design and Implementation
archive
Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
table of contents
Orlando, Florida, United States
Pages: 36 - 48
Year of Publication: 1994
ISBN:0-89791-662-X
Also published in ...
|
|
Authors
|
|
Kemal Ebcioglu
|
IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY
|
|
Randy D. Groves
|
IBM Rise Systern/6000 Division, 11400 Bumet Road, Austin, TX
|
|
Ki-Chang Kim
|
IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY
|
|
Gabriel M. Silberman
|
IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY
|
|
Isaac Ziv
|
IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 27, Citation Count: 10
|
|
|
ABSTRACT
We describe techniques for converting the intermediate code representation of a given program, as generated by a modern compiler, to another representation which produces the same run-time results, but can run faster on a superscalar machine. The algorithms, based on novel parallelization techniques for Very Long Instruction Word (VLIW) architectures, find and place together independently executable operations that may be far apart in the original code. i.e., they may be separated by many conditional branches or belong to different iterations of a loop. As a result, the functional units in the superscalar are presented with more work that can proceed in parallel, thus achieving higher performance than the approach of using hardware instruction dispatch techniques alone.While general scheduling techniques improve performance by removing idle pipeline cycles, to further improve performance on a superscalar with only a few functional units requires a reduction in the pathlength. We have designed a set of new algorithms for reducing pathlength and removing stalls due to branches, namely speculative load-store motion out of loops, unspeculation, limited combining, basic block expansion, and prolog tailoring. These algorithms were implemented in a prototype version of the IBM RS/6000 xlc compiler and have shown significant improvement in SPEC integer benchmarks on the IBM POWER machines.Also, we describe a new technique to obtain profiling information with low overhead, and some applications of profiling directed feedback, including scheduling heuristics, code reordering and branch reversal.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
Pohua P. Chang , Scott A. Mahlke , William Y. Chen , Nancy J. Warter , Wen-mei W. Hwu, IMPACT: an architectural framework for multiple-instruction-issue processors, Proceedings of the 18th annual international symposium on Computer architecture, p.266-275, May 27-30, 1991, Toronto, Ontario, Canada
|
 |
7
|
|
| |
8
|
K. Ebcioglu, "Global Scheduling," section of a book on compiler optimizations, F. Allen, K. Zadeck, B.K. Rosen (eds.), to appear.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
Lawrence Feigen , David Klappholz , Robert Casazza , Xing Xue, The revival transformation, Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.421-434, January 16-19, 1994, Portland, Oregon, United States
[doi> 10.1145/174675.178043]
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
M. D. Smith , M. Johnson , M. A. Horowitz, Limits on multiple instruction issue, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.290-302, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
24
|
G.J. Sussman and G.L. Steele, "Constraints-- A Language for Expressing Almost Hierarchical Descriptions," Artificial intelligence, Vol. 14, pp. 1-39, 1980.
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rajiv Gupta , David A. Berson , Jesse Z. Fang, Resource-sensitive profile-directed data flow analysis for code optimization, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.358-368, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
W. Zhang , M. Karakoy , M. Kandemir , G. Chen, A compiler approach for reducing data cache energy, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
Michael Gschwind , Kemal Ebcioğlu , Erik Altman , Sumedh Sathaye, Binary translation and architecture convergence issues for IBM system/390, Proceedings of the 14th international conference on Supercomputing, p.336-347, May 08-11, 2000, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|