|
ABSTRACT
Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to exploit due to its irregularity. In this article, we introduce a new code-scheduling technique for irregular ILP called “selective scheduling” which can be used as a component for superscalar and VLIW compilers. Selective scheduling can compute a wide set of independent operations across all execution paths based on renaming and forward-substitution and can compute available operations across loop iterations if combined with software pipelining. This scheduling approach has better heuristics for determining the usefulness of moving one operation versus moving another and can successfully find useful code motions without resorting to branch profiling. The compile-time overhead of selective scheduling is low due to its incremental computation technique and its controlled code duplication. We parallelized the SPEC integer benchmarks and five AIX utilities without using branch probabilities. The experiments indicate that a fivefold speedup is achievable on realistic resources with a reasonable overhead in compilation time and code expansion and that a solid speedup increase is also obtainable on machines with fewer resources. These results improve previously known characteristics of irregular ILP.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
EBCIO(~LU, I{. 1988. Some design ideas for a VLIW architecture for sequential natured software. In Parallel Processing (Proceedings of IFIP WG 10.3 Working Conference on Parallel Processing). North Holland, Amsterdam, 3-21.
|
| |
9
|
EBCIO(~LU, I{. AND GROVES, R. 1990. Some global compilation optimizations and architectural features for improving performance of superscMars. Res. Rep. RC-16145, IBM T. J. Watson Research Center, Yorktown Heights, N.Y.
|
 |
10
|
Kemal Ebcioglu , Randy D. Groves , Ki-Chang Kim , Gabriel M. Silberman , Isaac Ziv, VLIW compilation techniques in a superscalar environment, Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, p.36-48, June 20-24, 1994, Orlando, Florida, United States
|
| |
11
|
|
| |
12
|
EBCIO(~LU, K. AND NICOLAU, A. 1989. Percolation scheduling with resource constraints. Tech. Rep. 89-31, Univ. of California, Irvine, Calif.
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
Nicolas Gloy , Michael D. Smith , Cliff Young, Performance issues in correlated branch prediction schemes, Proceedings of the 28th annual international symposium on Microarchitecture, p.3-14, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
18
|
|
| |
19
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
20
|
IBM. 1990. A special issue on IBM RISC System/6000. IBM J. Res. Devel. 34, 1 (Jan.).
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
MOON, S.-M. 1997. Increasing cache bandwidth using multiport caches for exploiting ILP in non-numerical codes. IEEE Proceedings - Computers and Digital Techniques 1~, 5 (Sept.), 295-303.
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
MOON, S.-M., KIM, S., PARK, J., AND EBCIOGLU, K. 1997. Unrolling-based copy coalescing. Tech. Rep. SNU-EE-TR-1997-7, Seoul National Univ., Seoul, Korea.
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
Seongbae Park , SangMin Shim , Soo-Mook Moon, Evaluation of scheduling techniques on a SPARC-based VLIW testbed, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.104-113, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
34
|
|
| |
35
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
| |
36
|
|
 |
37
|
|
| |
38
|
SCHWARTZ, J. AND SHARIR, M. 1979. A design for optimizations of the bit vectoring class. Tech. Rep. 17, Courant Inst. of Computer Science, New York Univ., New York.
|
| |
39
|
|
 |
40
|
|
 |
41
|
Michael D. Smith , Mark Horowitz , Monica S. Lam, Efficient superscalar performance through boosting, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.248-259, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
42
|
M. D. Smith , M. Johnson , M. A. Horowitz, Limits on multiple instruction issue, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.290-302, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
43
|
WARREN, H., AUSLANDER, M., CHAITIN, G., CHIBIB, A., HOPKINS, M., AND MACKAY, A. Jun 1986. Final code generation in the PL.8 compiler. Res. Rep. RC 11974, IBM T.J. Watson Research Center, Yorktown Heights, N.Y.
|
CITED BY 18
|
|
|
|
|
|
|
|
Suhyun Kim , Soo-Mook Moon , Jinpyo Park , Kemal Ebcioğlu, Unroll-based register coalescing, Proceedings of the 14th international conference on Supercomputing, p.296-305, May 08-11, 2000, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
W. Zhang , M. Karakoy , M. Kandemir , G. Chen, A compiler approach for reducing data cache energy, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
|
|
|
Seongbae Park , SangMin Shim , Soo-Mook Moon, Evaluation of scheduling techniques on a SPARC-based VLIW testbed, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.104-113, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongbo Rong , Zhizhong Tang , R. Govindarajan , Alban Douillet , Guang R. Gao, Single-Dimension Software Pipelining for Multi-Dimensional Loops, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.163, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|