ACM Home Page
Please provide us with feedback. Feedback
Interprocedural parallelization analysis in SUIF
Full text PdfPdf (2.03 MB)
Source ACM Transactions on Programming Languages and Systems (TOPLAS) archive
Volume 27 ,  Issue 4  (July 2005) table of contents
Pages: 662 - 731  
Year of Publication: 2005
ISSN:0164-0925
Authors
Mary W. Hall  USC Information Sciences Institute, Marina del Rey, CA
Saman P. Amarasinghe  Massachusetts Institute of Technology, Cambridge, MA
Brian R. Murphy  Intel Corp.
Shih-Wei Liao  Intel Corp.
Monica S. Lam  Stanford University, Stanford, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 108,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1075382.1075385
What is a DOI?

ABSTRACT

As shared-memory multiprocessor systems become widely available, there is an increasing need for tools to simplify the task of developing parallel programs. This paper describes one such tool, the automatic parallelization system in the Stanford SUIF compiler. This article represents a culmination of a several-year research effort aimed at making parallelizing compilers significantly more effective. We have developed a system that performs full interprocedural parallelization analyses, including array privatization analysis, array reduction recognition, and a suite of scalar data-flow analyses including symbolic analysis. These analyses collaborate in an integrated fashion to exploit coarse-grain parallel loops, computationally intensive loops that can execute on multiple processors independently with no cross-processor synchronization or communication. The system has successfully parallelized large interprocedural loops over a thousand lines of code completely automatically from sequential applications.This article provides a comprehensive description of the analyses in the SUIF system. We also present extensive empirical results on four benchmark suites, showing the contribution of individual analysis techniques both in executing more of the computation in parallel, and in increasing the granularity of the parallel computations. These results demonstrate the importance of interprocedural array data-flow analysis, array privatization and array reduction recognition; a third of the programs spend more than 50&percent; of their execution time in computations that are parallelized with these techniques. Overall, these results indicate that automatic parallelization can be effective on sequential scientific computations, but only if the compiler incorporates all of these analyses.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
6
7
 
8
Bailey, D., Barton, J., Lasinski, T., and Simon, H. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3 (Fall), 63--73.
9
10
 
11
12
 
13
Blelloch, G. E. 1990. Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pa. Nov. (Also published in Synthesis of Parallel Algorithms, ed. J. E. Reif.)
 
14
 
15
 
16
 
17
18
19
20
 
21
Cooper, K., Hall, M., and Kennedy, K. 1993. A methodology for procedure cloning. Comput. Lang. 19, 2 (Apr.).
 
22
23
 
24
Cooper, K. D., Hall, M. W., Kennedy, K., and Torczon, L. 1995. Interprocedural analysis and optimization. Commun. Pure Appl. Math. 48.
 
25
Creusillet, B. 1996. Array region analyses and applications. Ph.D. dissertaion. Ecole des Mines de Paris.
 
26
27
 
28
Dantzig, G. and Eaves, B. 1973. Fourier-Motzkin elimination and its dual. J. Combinat. Theory (A) 14, 288--297.
29
 
30
Feautrier, P. 1988b. Parametric integer programming. Recherche Operationnelle/Oper. Res. 22, 3 (Sept.), 243--268.
 
31
Feautrier, P. 1991. Dataflow analysis of scalar and array references. Int. J. Paral. Prog. 20, 1 (Feb.), 23--52.
32
33
34
 
35
 
36
Grout, J. 1995. Inline expansion for the Polaris research compiler. M.S. thesis, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign.
37
 
38
39
40
 
41
 
42
Hall, M. W., Murphy, B. R., and Amarasinghe, S. P. 1995b. Interprocedural analysis for parallelization: Design and experience. In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing (San Francisco, Calif.). SIAM, Philadelphia, Pa, 650--655.
 
43
 
44
Harrison, W. 1989. The interprocedural analysis and automatic parallelization of Scheme programs. Lisp and Symbolic Computation 2, 3/4 (Oct.), 179--396.
 
45
 
46
 
47
48
 
49
50
51
 
52
Kennedy, K. 1976. A comparison of two algorithms for global data flow analysis. SIAM J. Computing 5, 1, 158--180.
53
54
 
55
56
57
58
 
59
Maydan, D. E., Hennessy, J. L., and Lam, M. S. 1992. Effectiveness of data dependence analysis. In Proceedings of the NSF-NCRD Workshop on Advanced Compilation Techniques for Novel Architectures.
60
61
62
 
63
 
64
Pottenger, B. and Eigenmann, R. 1995. Parallelization in the presence of generalized induction and reduction variables. In Proceedings of the 1995 ACM International Conference on Supercomputing. ACM, New York.
65
 
66
Reilly, J. 1995. SPEC describes SPEC95 products and benchmarks. Spec newsletter, SPEC. September.
 
67
Ribas, H. 1990. Obtaining dependence vectors for nested-loop computations. In Proceedings of the 1990 International Conference on Parallel Processing (St. Charles, Ill.).
 
68
 
69
Sharir, M. and Pnueli, A. 1981. Two approaches to interprocedural data flow analysis. In Program Flow Analysis: Theory and Applications, S. Muchnick and N. Jones, Eds. Prentice-Hall, Inc., Englewood Cliffs., N.J.
 
70
 
71
Singh, J. P. and Hennessy, J. L. 1991. An empirical investigation of the effectiveness of and limitations of automatic parallelization. In Proceedings of the International Symposium on Shared Memory Multiprocessors (Tokyo, Japan).
72
73
74
75
76
 
77
 
78
 
79
Ullman, J. 1973. Fast algorithms for the elimination of common subexpressions. Acta Inf. 2, 191--213.
 
80
Uniejewski, J. 1989. SPEC Benchmark Suite: Designed for today's advanced systems. SPEC Newsletter Volume 1, Issue 1, SPEC. Fall.
 
81
 
82


Collaborative Colleagues:
Mary W. Hall: colleagues
Saman P. Amarasinghe: colleagues
Brian R. Murphy: colleagues
Shih-Wei Liao: colleagues
Monica S. Lam: colleagues