|
ABSTRACT
As shared-memory multiprocessor systems become widely available, there is an increasing need for tools to simplify the task of developing parallel programs. This paper describes one such tool, the automatic parallelization system in the Stanford SUIF compiler. This article represents a culmination of a several-year research effort aimed at making parallelizing compilers significantly more effective. We have developed a system that performs full interprocedural parallelization analyses, including array privatization analysis, array reduction recognition, and a suite of scalar data-flow analyses including symbolic analysis. These analyses collaborate in an integrated fashion to exploit coarse-grain parallel loops, computationally intensive loops that can execute on multiple processors independently with no cross-processor synchronization or communication. The system has successfully parallelized large interprocedural loops over a thousand lines of code completely automatically from sequential applications.This article provides a comprehensive description of the analyses in the SUIF system. We also present extensive empirical results on four benchmark suites, showing the contribution of individual analysis techniques both in executing more of the computation in parallel, and in increasing the granularity of the parallel computations. These results demonstrate the importance of interprocedural array data-flow analysis, array privatization and array reduction recognition; a third of the programs spend more than 50&percent; of their execution time in computations that are parallelized with these techniques. Overall, these results indicate that automatic parallelization can be effective on sequential scientific computations, but only if the compiler incorporates all of these analyses.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Vikram Adve , Alan Carle , Elana Granston , Seema Hiranandani , Ken Kennedy , Charles Koelbel , Ulrich Kremer , John Mellor-Crummey , Scott Warren , Chau-Wen Tseng, Requirements for Data-Parallel Programming Environments, IEEE Parallel & Distributed Technology: Systems & Technology, v.2 n.3, p.48-58, September 1994
[doi> 10.1109/M-PDT.1994.329801]
|
| |
2
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
3
|
|
| |
4
|
|
| |
5
|
Saman P. Amarasinghe , Jennifer M. Anderson , Christopher S. Wilson , Shih-Wei Liao , Brian R. Murphy , Robert S. French , Monica S. Lam , Mary W. Hall, Multiprocessors from a Software Perspective, IEEE Micro, v.16 n.3, p.52-61, June 1996
[doi> 10.1109/40.502406]
|
 |
6
|
|
 |
7
|
Jennifer M. Anderson , Saman P. Amarasinghe , Monica S. Lam, Data and computation transformations for multiprocessors, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.166-178, July 19-21, 1995, Santa Barbara, California, United States
|
| |
8
|
Bailey, D., Barton, J., Lasinski, T., and Simon, H. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3 (Fall), 63--73.
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
Blelloch, G. E. 1990. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pa. Nov. (Also published in Synthesis of Parallel Algorithms, ed. J. E. Reif.)
|
| |
14
|
|
| |
15
|
William Blume , Ramon Doallo , Rudolf Eigenmann , John Grout , Jay Hoeflinger , Thomas Lawrence , Jaejin Lee , David Padua , Yunheung Paek , Bill Pottenger , Lawrence Rauchwerger , Peng Tu, Parallel Programming with Polaris, Computer, v.29 n.12, p.78-82, December 1996
[doi> 10.1109/2.546612]
|
| |
16
|
|
| |
17
|
|
 |
18
|
Edouard Bugnion , Jennifer M. Anderson , Todd C. Mowry , Mendel Rosenblum , Monica S. Lam, Compiler-directed page coloring for multiprocessors, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.244-255, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
19
|
|
 |
20
|
|
| |
21
|
Cooper, K., Hall, M., and Kennedy, K. 1993. A methodology for procedure cloning. Comput. Lang. 19, 2 (Apr.).
|
| |
22
|
|
 |
23
|
|
| |
24
|
Cooper, K. D., Hall, M. W., Kennedy, K., and Torczon, L. 1995. Interprocedural analysis and optimization. Commun. Pure Appl. Math. 48.
|
| |
25
|
Creusillet, B. 1996. Array region analyses and applications. Ph.D. dissertaion. Ecole des Mines de Paris.
|
| |
26
|
|
 |
27
|
George Cybenko , Lyle Kipp , Lynn Pointer , David Kuck, Supercomputer performance evaluation and the Perfect Benchmarks, Proceedings of the 4th international conference on Supercomputing, p.254-266, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
28
|
Dantzig, G. and Eaves, B. 1973. Fourier-Motzkin elimination and its dual. J. Combinat. Theory (A) 14, 288--297.
|
 |
29
|
|
| |
30
|
Feautrier, P. 1988b. Parametric integer programming. Recherche Operationnelle/Oper. Res. 22, 3 (Sept.), 243--268.
|
| |
31
|
Feautrier, P. 1991. Dataflow analysis of scalar and array references. Int. J. Paral. Prog. 20, 1 (Feb.), 23--52.
|
 |
32
|
Gina Goff , Ken Kennedy , Chau-Wen Tseng, Practical dependence testing, Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, p.15-29, June 24-28, 1991, Toronto, Ontario, Canada
|
 |
33
|
|
 |
34
|
|
| |
35
|
|
| |
36
|
Grout, J. 1995. Inline expansion for the Polaris research compiler. M.S. thesis, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign.
|
 |
37
|
|
| |
38
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
 |
39
|
|
 |
40
|
Mary H. Hall , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Monica S. Lam, Detecting coarse-grain parallelism using an interprocedural parallelizing compiler, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.49-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224337]
|
| |
41
|
|
| |
42
|
Hall, M. W., Murphy, B. R., and Amarasinghe, S. P. 1995b. Interprocedural analysis for parallelization: Design and experience. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing (San Francisco, Calif.). SIAM, Philadelphia, Pa, 650--655.
|
| |
43
|
|
| |
44
|
Harrison, W. 1989. The interprocedural analysis and automatic parallelization of Scheme programs. Lisp and Symbolic Computation 2, 3/4 (Oct.), 179--396.
|
| |
45
|
|
| |
46
|
|
| |
47
|
|
 |
48
|
|
| |
49
|
|
 |
50
|
|
 |
51
|
|
| |
52
|
Kennedy, K. 1976. A comparison of two algorithms for global data flow analysis. SIAM J. Computing 5, 1, 158--180.
|
 |
53
|
|
 |
54
|
Zhiyuan Li , Pen-Chung Yew, Efficient interprocedural analysis for program parallelization and restructuring, Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, p.85-99, July 19-21, 1988, New Haven, Connecticut, United States
|
| |
55
|
|
 |
56
|
|
 |
57
|
|
 |
58
|
Dror E. Maydan , John L. Hennessy , Monica S. Lam, Efficient and exact data dependence analysis, Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, p.1-14, June 24-28, 1991, Toronto, Ontario, Canada
|
| |
59
|
Maydan, D. E., Hennessy, J. L., and Lam, M. S. 1992. Effectiveness of data dependence analysis. In Proceedings of the NSF-NCRD Workshop on Advanced Compilation Techniques for Novel Architectures.
|
 |
60
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
61
|
|
 |
62
|
|
| |
63
|
|
| |
64
|
Pottenger, B. and Eigenmann, R. 1995. Parallelization in the presence of generalized induction and reduction variables. In Proceedings of the 1995 ACM International Conference on Supercomputing. ACM, New York.
|
 |
65
|
|
| |
66
|
Reilly, J. 1995. SPEC describes SPEC95 products and benchmarks. Spec newsletter, SPEC. September.
|
| |
67
|
Ribas, H. 1990. Obtaining dependence vectors for nested-loop computations. In Proceedings of the 1990 International Conference on Parallel Processing (St. Charles, Ill.).
|
| |
68
|
|
| |
69
|
Sharir, M. and Pnueli, A. 1981. Two approaches to interprocedural data flow analysis. In Program Flow Analysis: Theory and Applications, S. Muchnick and N. Jones, Eds. Prentice-Hall, Inc., Englewood Cliffs., N.J.
|
| |
70
|
|
| |
71
|
Singh, J. P. and Hennessy, J. L. 1991. An empirical investigation of the effectiveness of and limitations of automatic parallelization. In Proceedings of the International Symposium on Shared Memory Multiprocessors (Tokyo, Japan).
|
 |
72
|
|
 |
73
|
|
 |
74
|
|
 |
75
|
Rémi Triolet , Francois Irigoin , Paul Feautrier, Direct parallelization of call statements, Proceedings of the 1986 SIGPLAN symposium on Compiler construction, p.176-185, June 25-27, 1986, Palo Alto, California, United States
|
 |
76
|
|
| |
77
|
|
| |
78
|
|
| |
79
|
Ullman, J. 1973. Fast algorithms for the elimination of common subexpressions. Acta Inf. 2, 191--213.
|
| |
80
|
Uniejewski, J. 1989. SPEC Benchmark Suite: Designed for today's advanced systems. SPEC Newsletter Volume 1, Issue 1, SPEC. Fall.
|
| |
81
|
|
| |
82
|
|
|