ACM Home Page
Please provide us with feedback. Feedback
Compiler transformations for high-performance computing
Full text PdfPdf (6.32 MB)
Source ACM Computing Surveys (CSUR) archive
Volume 26 ,  Issue 4  (December 1994) table of contents
Pages: 345 - 420  
Year of Publication: 1994
ISSN:0360-0300
Authors
David F. Bacon  Computer Science Division, University of California, Berkeley, California
Susan L. Graham  Computer Science Division, University of California, Berkeley, California
Oliver J. Sharp  Computer Science Division, University of California, Berkeley, California
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 52,   Downloads (12 Months): 449,   Citation Count: 118
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/197405.197406
What is a DOI?

ABSTRACT

In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
ABU-SUFAH, W., KUCK, D. J., AND LAWRIE, I). 1981 On the performance enhancement of paging systems through program analysis and transformations IEEE Trans. Comput. C-30, 5 (May), 341-356.
 
4
ACKERSLAXq, W B. 1982. Data flow languages. Computer 15, 2 (Feb.), 15-25.
5
 
6
7
 
8
 
9
 
10
ALLEN, F. E. 1969. Program optimization. In Annual Rewew ~n Automatic Programming 5. International Tracts in Computer Science and Technology and their Application, vol. 13. Pergamon Press, Oxford, England, 239-307.
 
11
ALLEN, F. E. AND COCKE, J. 1971. A catalogue of optimizing transformations. In Design and Opt~m~zat~on of Compilers, R Rustin, Ed. Prentice-Hall, Englewood Cliffs, N.J., 1-30.
12
13
 
14
15
 
16
ALLEN, F. E., COCKE, J., AND KENNEDY, K. 1981. Reduction of operator ~trength. In Program Flow Analysis: Theory and Apphcations, S. S. Muchnik and N. D. Jones, Eds., Prentice-Hall, Englewood Cliffs, N.J., 79-101.
17
 
18
 
19
AMERICAN NATIONAL STANDARDS INSTITUTE. 1987. An American National standard, IEEE standard for binary floating-point arithmetic SIGPLAN Not. 22, 2 (Feb.), 9-25.
20
21
 
22
23
 
24
 
25
ARWND, KATHAIL, V., AND PINGALI, K. 1980. A dataflow architecture with tagged tokens. Tech. Rep TM-174, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
 
26
 
27
BAILEY, D. H. 1992. On the behavior of cache memories with strided data access. Tech. Rep. RNR- 92-015, NASA Ames Research Center, Moffett Field, Calif., (May).
 
28
BAILEY, D. H., BARszcz, E., BARTON, J. T., BROWNING, D. S., CARTER, R. L., DAGUM, L., FATOOHI, R. A., FREDERICKSON, P. 0., LASINSKI, T. A., SCHREIBER, R. S., SIMON, H. D., VENKATAKRISHNAN, V., AND WEERATUNGA, S. K. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3 (Fall), 63-73.
 
29
 
30
BALASUNDARAM, V., Fox, G., KENNEDY, K., AND KREMER, V. 1990. An interactive environment for data partitioning and distribution. In Proceedtngs of the 5th D~strzbuted Memory Computer Conference (Charleston, South Carolina, Apr.). IEEE Computer Society Press, Los Alamitos, Calif.
31
32
 
33
B~NE~EE, U. 1991. Unimodular transformations of double loops. In Advances in Languages and Compilers for Parallel Processing, A. Nicolau, Ed. Research Monographs in Parallel and Distributed Computing, MIT Press, Cambridge, Mass., Chap. 10.
 
34
 
35
BnNERJEE, U. 1988b. An introduction to a formal theory of dependence analysis. J. Supercomput. 2, 2 (Oct.), 133-149.
 
36
 
37
BANERJEE~ U., CHEN, S. C., KUCK, D. J., AND TOWLE, R. A. 1979. Time and parallel processor bounds for Fortran-like loops. IEEE Trans. Comput. C-28, 9 (Sept.), 660-670.
38
 
39
 
40
BERRY, M., CHEN, D., Koss, P., KUCK, D., Lo, S., PANG, Y., POINTER, L., ROLOFF, R., SAMEH, A., CLEMENTI, E_. CHIN. S.. SCHNEIDER. D.. FOX. FT., MESSINA, P., WALKER, D., HSIUNG, C., SCHWARZMEIER, J., LUE, K., ORSZAG, S., SEIDL, F., JOHNSON, O., GOODRUM, R., AND MARTIN, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Appl. 3, 3 (Fall), 5-40.
 
41
42
43
 
44
BURNETT, G. J. AND COFFMAN, E. G., JR. 1970. A study of interleaved memory systems. In Proceedings of the Spring Joint AFIPS Computer Conference. vol. 36. AFIPS, Montvale, N.J., 467-474.
45
46
 
47
 
48
CALLAHAN, D. AND KENNEDY, K. 1988b. Compiling programs for distributed-memory multiprocessots. J. Supercomput. 2, 2 (Oct.), 151-169.
49
 
50
51
 
52
53
 
54
CHAITIN, G. J., AUSLANDER, M. A., CHANDRA, A. K., COOKE, J., HOPKINS, M. E., AND MARKSTEIN, P. W. 1981. Register allocation via coloring. Comput. Lang. 6, 1 (Jan.), 47-57.
55
 
56
57
 
58
CHEN, S. C. AND KUCK, D. J. 1975. Time and parallel processor bounds for linear recurrence systems. IEEE Trans. Comput. C-24 7 (July), 701-717.
59
60
61
 
62
63
64
 
65
COCKE, J. AND MARKSTEIN, P. 1980. Measurement of program improvement algorithms. In Proceedrags of the IFIP Congress (Tokyo, Japan, Oct.). North-Holland, Amsterdam, Netherlands, 221-228. Also available as IBM Tech. Rep. RC 8111, Feb. 1980.
 
66
 
67
68
 
69
COOPER, K. D., HALL, M. W., AND KENNEDY, K. 1993. A methodology for procedure cloning. Comput. Lang. 19, 2 (Apr.), 105-117.
70
 
71
 
72
CRAY RESEARCH 1988 CFT77 Reference Manual. Publication SR-0018-C. Cray Research, Inc., Eagan, Minn.
 
73
CYTRON, R. 1986. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the Internatwnal Conference on Parallel Processing (St. Charles, Ill., Aug.). IEEE Computer Society, Washington, D.C., 836-844.
 
74
CYTRON, R. AND FERRANTE, J. 1987. What's in a name? -or- the value of renaming for parallelism detection and storage allocation. Proceedings of the Internatwnal Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 19-27.
 
75
DANTZIG, G. B. AND EAVES, B. C. 1974. Fourier- Motzkin ehmination and sits dual with application to integer programming, in Combinatorial Programming: Methods and Applications (Versailles, France, Sept.). B. D. Roy, Ed. D. Reidel, Boston, Mass., 93-102.
 
76
DENNIS, J. B. 1980 Data flow supercomputers_ Computer 13, 11 (Nov.), 48-56.
 
77
 
78
DONGARRA, J. AND HIND, A. R. 1979. Unrolling loops in Fortran. Softw. Pratt. Exper. 9, 3 (Mar.), 219-226.
79
 
80
 
81
82
 
83
FEAUTRIER, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parall. Program. 20, 1 (Feb.), 23-52.
84
 
85
FERRARI, D. 1976. The improvement of program behavior. Computer 9, 11 (Nov.), 39-47.
 
86
FISHER, J.A. 1981. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comput. C-30, 7 (July), 478-490.
87
 
88
FLOATING POINT SYSTEMS. 1979. FPS AP-120B Processor Handbook. Floating Point Systems, Inc., Beaverton, Ore.
 
89
FREE SOFTWARE FOUNDATION. 1992. gCC 2.X Reference Manual. Free Software Foundation, Cambridge, Mass.
90
 
91
 
92
 
93
 
94
95
96
97
 
98
 
99
GUPTA, M., MIDKIFF, S., SCHONBERG, E., SWEENEY, P., WANG, K. Y., AND BURKE, M. 1993. PTRAN II: A compiler for High Performance Fortran. In Proceedings of the 4th International Workshop on Compilers for Parallel Computers (Delft, Netherlands, Dec.). 479-493.
100
 
101
HARRIS, K. AND HOBBS, S. 1994. VAX Fortran. In Optimization in Compilers, F. E. Allen, R. Cytron, B. K. Rosen, and K. Zadeck, Eds. ACM Press, New York, chap. 16. To be published.
 
102
HATFIELD, D. J. AND GERALD, j. 1971. Program restructuring for virtual memory. IBM Syst. J. 10, 3, 168-192.
 
103
 
104
HEWLETT-PACKARD. 1992. PA-RISC 1.1 Architecture and Instruction Manual. 2nd ed. Part Number 09740-90039. Hewlett-Packard, Palo Alto, Ca}i~
 
105
HIGH PERFORMANCE FORTRAN FORUM. 1993. High Performance Fortran language specification, version 1.0. Tech. Rep. CRPC-TR92225, Rice University, Houston, Tex.
106
107
 
108
109
 
110
 
111
IBM. 1992. Optimization and Tuning Gutde for the XL Fortran and XL C Compilers. 1st ed. Publication SC09-1545-00, IBM, Armonk, N.Y.
 
112
IBM. 1991. IBM RISC System~6000 NIC Tuning Guide for Fortran and C Publication GG24- 3611-01, IBM, Armonk, N.Y.
113
 
114
 
115
 
116
KENNEDY, K., MCKINLEY, K. S., AND TSENG, C.-W. 1993. Analysis and transformation in an interactive parallel programming tool. Concurrency Pract. Exper 5, 7 (Oct.), 575-602.
117
 
118
 
119
 
120
KNOBE, K., LUKAS, J D., AND STEELE, G. L., JR 1988. Massively parallel data optimization. In The 2nd. Symposium on the Frontiers of Massively Parallel Computation (Fairfax, Va., Oct.). IEEE, Piscataway, N.J., 551-558.
 
121
KNUTH, D. E. 1971. An empirical study of Fortran programs. Soflw. Pract. Exper. 1, 2 (Apr.-June), 105-133.
 
122
 
123
124
 
125
126
 
127
KUCK, D. J. AND STOKES, R. 1982. The Burroughs Scientific Processor (BSP). IEEE Trans. Comput. C-31, 5 (May), 363-376.
128
129
130
131
132
133
 
134
 
135
136
 
137
 
138
LI, J. AND CHEN, M. 1990. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In The 3rd Symposium on the Frontiers of Massively Parallel Computation, J. Jaja, Ed. IEEE Computer Society Press, Los Alamitos, Calif., 424-433.
 
139
LI, Z. AND YEW, P. 1988. Interprocedura} analysis for parallel computing. In Proceedings of the International Conference on Parallel Processing. F. A. Briggs, Ed. Vol. 2. Pennsylvania State University Press, University Park, Pa., 221-228.
 
140
141
142
 
143
 
144
MACE, M. E. AND WAGNER, R. A. 1985. Globally optimum selection of memory storage patterns. In Proceedings of the International Conference on Parallel Processing, D. DeGrott, Ed. IEEE Computer Society, Washington, D.C., 264-271.
 
145
MARKSTEIN, P., MARKSTEIN, V., AND ZADECK, K. 1994. Strength reduction. In Optimizatmn in Compilers. ACM Press, New York, Chap. 9. To be published.
146
147
148
149
 
150
McGRAw, J. R. 1985. SISAL: Streams and iteration in a single assignment language. Tech. Rep. M-146, Lawrence Livermore National Laboratory, Livermore, Calif.
 
151
MCMAHON, F. M. 1986. The Livermore Fortran kernels: A computer test of numerical performance range. Tech. Rep. UCRL-55745, Lawrence Livermore National Laboratory, Livermore, Calif.
 
152
MICHIE, D. 1968. "Memo" functions and machine learning. Nature 218, 19-22.
153
154
155
 
156
 
157
MUm~OKA, Y. 1971. Parallelism exposure and exploitation in programs. Ph.D. thesis, Tech. Rep. 71-424, Univ. of Illinois at Urbana-Champaign.
158
 
159
 
160
NICOLAU, A. AND FISHER, J. A. 1984. Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Comput. C-33, 11 (Nov.), 968-976.
 
161
NIKHIL, R. S. 1988. ID reference manual, version 88.0. Tech. Rep. 284, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
162
 
163
O'BRIEN, K., HAY, B., MINISH, J., SCHAFFER, H., SCHLOSS, B., SHEPHERD, A., AND ZALESKI, M. 1990. Advanced compiler technology for the RISC System/6000 architecture. In IBM RISC System/6000 Technology. Publication SA23- 2619. IBM Corporation, Mechanlcsburg, Penn
 
164
 
165
 
166
PADUA, D. A. AND PETERSEN, P. M. 1992. Evaluation of paralleIizing compilers. Parallel Computing and Transputer Applications, (Barcelona, Spain. Sept.). CIMNE, 1505-1514. Also available as Center for Supercomputing Research and Development Tech Rep. 1173.
167
 
168
PADUA, D. A., KUCK, D. j. AND LAWRIE, D. 1980. High-speed multiprocessors and compilation techniques. IEEE Trans. Comput. C-29, 9 (Sept.), 763-776.
169
170
 
171
 
172
 
173
POLYCHRONOPOULOS, C. D. 1987b. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the International Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 235-242.
 
174
 
175
POLYCHRONOPOULOS, C. D., GIRKAR, M., HAGHIGHAT, M. R., LEE, C. L., LEUNG, B., AND SCHOUTEN, D. 1989. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. In Proceedings of the International Conference on Parallel Processing. Volume 2. Pennsylvania State University Press, University Park, Pa., 39-48.
176
177
178
 
179
 
180
181
 
182
RISEMAN, E. M. AND FOSTER, C. C. 1972. The inhibition of potential parallelism by conditional jumps. IEEE Trans. Comput. C-21~ 12 (Dec.), 1405-1411.
 
183
184
185
 
186
187
188
 
189
 
190
SINGH, J. P. AND HENNESSY, J. L. 1991. An empirical investigation of the effectiveness and limitations of automatic parallelization. In Proceedings of the International Symposium on Shared Memory Multiprocessing, (Apr.). 213-240.
191
 
192
 
193
194
195
 
196
 
197
SUN MICROSYSTEMS. 1991. SPARC Architecture Manual, Version 8. Part No. 800-1399-08. Sun Microsystems, Mountain View, Calif.
198
 
199
 
200
THINKING MACHINES. 1989. Connection Machine, Model CM-2 Technical Summary. Thinking Machines Corp., Cambridge, Mass.
 
201
TJADEN, G. S. AND FLYNN, M. J. 1970. Detection and parallel execution of parallel instructions. IEEE Trans. Comput. C-19, 10 (Oct.), 889-895.
 
202
 
203
204
 
205
 
206
207
208
 
209
 
210
WANG, Ko AND GANNON, D. 1989. Applying AI techniques to program optimization for parallel computers, in Parallel Processing for Supercomputers and Artificial Intelligence, K. Hwang and D. Degroot, Eds. McGraw Hill, New York, Chap. 12.
211
212
213
214
 
215
216
 
217
 
218
 
219
 
220
ZIMA, H. P, BAST, H. J, AND GERNDT, M. 1988. SUPERB. A tool for semi-automatic S1MD/MIMD parallelization Parall. Comput 6, 1 (Jan.J, 1-18.

CITED BY  118


REVIEW

"Max Hailperin : Reviewer"

This is a comprehensive survey of high-level optimizing transformations used in compilers for superscalar, vector, and multiprocessor machines. The emphasis is on transformations endemic to these architectures; supporting analyses and more classic  more...

Collaborative Colleagues:
David F. Bacon: colleagues
Susan L. Graham: colleagues
Oliver J. Sharp: colleagues