ACM Home Page
Please provide us with feedback. Feedback
Parallelizing nonnumerical code with selective scheduling and software pipelining
Full text PdfPdf (544 KB)
Source ACM Transactions on Programming Languages and Systems (TOPLAS) archive
Volume 19 ,  Issue 6  (November 1997) table of contents
Pages: 853 - 898  
Year of Publication: 1997
ISSN:0164-0925
Authors
Soo-Mook Moon  Seoul National Univ., Seoul, Korea
Kemal Ebcioğlu  IBM T. J. Watson Research Center, Yorktown Heights, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 54,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/267959.269966
What is a DOI?

ABSTRACT

Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to exploit due to its irregularity. In this article, we introduce a new code-scheduling technique for irregular ILP called “selective scheduling” which can be used as a component for superscalar and VLIW compilers. Selective scheduling can compute a wide set of independent operations across all execution paths based on renaming and forward-substitution and can compute available operations across loop iterations if combined with software pipelining. This scheduling approach has better heuristics for determining the usefulness of moving one operation versus moving another and can successfully find useful code motions without resorting to branch profiling. The compile-time overhead of selective scheduling is low due to its incremental computation technique and its controlled code duplication. We parallelized the SPEC integer benchmarks and five AIX utilities without using branch probabilities. The experiments indicate that a fivefold speedup is achievable on realistic resources with a reasonable overhead in compilation time and code expansion and that a solid speedup increase is also obtainable on machines with fewer resources. These results improve previously known characteristics of irregular ILP.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
 
5
6
 
7
 
8
EBCIO(~LU, I{. 1988. Some design ideas for a VLIW architecture for sequential natured software. In Parallel Processing (Proceedings of IFIP WG 10.3 Working Conference on Parallel Processing). North Holland, Amsterdam, 3-21.
 
9
EBCIO(~LU, I{. AND GROVES, R. 1990. Some global compilation optimizations and architectural features for improving performance of superscMars. Res. Rep. RC-16145, IBM T. J. Watson Research Center, Yorktown Heights, N.Y.
10
 
11
 
12
EBCIO(~LU, K. AND NICOLAU, A. 1989. Percolation scheduling with resource constraints. Tech. Rep. 89-31, Univ. of California, Irvine, Calif.
 
13
14
15
16
 
17
 
18
 
19
 
20
IBM. 1990. A special issue on IBM RISC System/6000. IBM J. Res. Devel. 34, 1 (Jan.).
21
22
23
 
24
 
25
MOON, S.-M. 1997. Increasing cache bandwidth using multiport caches for exploiting ILP in non-numerical codes. IEEE Proceedings - Computers and Digital Techniques 1~, 5 (Sept.), 295-303.
 
26
 
27
28
 
29
MOON, S.-M., KIM, S., PARK, J., AND EBCIOGLU, K. 1997. Unrolling-based copy coalescing. Tech. Rep. SNU-EE-TR-1997-7, Seoul National Univ., Seoul, Korea.
30
 
31
32
 
33
34
 
35
 
36
37
 
38
SCHWARTZ, J. AND SHARIR, M. 1979. A design for optimizations of the bit vectoring class. Tech. Rep. 17, Courant Inst. of Computer Science, New York Univ., New York.
 
39
40
41
42
 
43
WARREN, H., AUSLANDER, M., CHAITIN, G., CHIBIB, A., HOPKINS, M., AND MACKAY, A. Jun 1986. Final code generation in the PL.8 compiler. Res. Rep. RC 11974, IBM T.J. Watson Research Center, Yorktown Heights, N.Y.

CITED BY  18

Collaborative Colleagues:
Soo-Mook Moon: colleagues
Kemal Ebcioğlu: colleagues