|
ABSTRACT
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for data locality and parallelism, reducing or eliminating false sharing. It also uses interprocedural analysis and transformations to improve the granularity of parallelism. Although the individual components of the algorithm have been published previously, their coordination is unique to this paper. For experimental validation, we do not attempt to parallelize “dusty deck” programs where many have tried and failed. Instead, we collect programs where the users tried to achieve excellent parallel performance. We apply our optimizations to sequential versions of these programs, i.e., the compiler was required to use its analysis and algorithms to parallelize the program and could not rely on user assertions that for example, a loop is parallel. With this metric, our algorithm improves or matches hand-coded parallel programs on shared-memory, bus-based parallel machines for eight of the nine programs in our test suite.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
S. Carr, K. S. MCKinley, and C. Tseng. Compiler optimizations for improving data locality. Technical Report TR92-195, Dept. of Computer Science, Rice University, November 1992.
|
| |
6
|
K. Cooper, M. W. Hall, R. T. Hood, K. Kennedy, K. S. McKinley, J. M. Mellor-Crummey, L. Torczon, and S. K. Warren. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244-263, February 1993.
|
 |
7
|
|
| |
8
|
R. Eigenmann, J. Hoeflinger, G. Jaxon, Z. Li, and D. Padua. Restructuring Fortran programs for Cedar. Concurrency: Practice & Experience, 5(7):553-574, October 1993.
|
 |
9
|
Mary W. Hall , Ken Kennedy , Kathryn S. McKinley, Interprocedural transformations for parallel code generation, Proceedings of the 1991 ACM/IEEE conference on Supercomputing, p.424-434, November 18-22, 1991, Albuquerque, New Mexico, United States
[doi> 10.1145/125826.126055]
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
K. Kennedy and K. S. MCKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, August 1993.
|
| |
14
|
K. Kennedy, K. S. MCKinley, and C. Tseng. Analysis and transformation in an interactive parallel programming tool. Concurrency: Practice & Experience, 5(7):575--602, October 1993.
|
 |
15
|
D. Kuck , E. Davidson , D. Lawrie , A. Sameh , C. Q. Zhu , A. Veidenbaum , J. Konicek , P. Yew , K. Gallivan , W. Jalby , H. Wijshoff , R. Bramley , U. M. Yang , P. Emrath , D. Padua , R. Eigenmann , J. Hoeflinger , G. Jaxon , Z. Li , T. Murphy , J. Andrews, The cedar system and an initial performance study, Proceedings of the 20th annual international symposium on Computer architecture, p.213-223, May 16-19, 1993, San Diego, California, United States
|
 |
16
|
Zhiyuan Li , Pen-Chung Yew, Efficient interprocedural analysis for program parallelization and restructuring, Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, p.85-99, July 19-21, 1988, New Haven, Connecticut, United States
|
| |
17
|
K.S. McKinley. Dependence analysis of arrays subscripted by index arrays. Technical Report TR91-162, Dept. of Computer Science, Rice University, December 1990.
|
| |
18
|
|
| |
19
|
|
| |
20
|
J. Singh and J. Hennessy. An empirical investigation of the effectiveness of and limitations of automatic parallelization. In Proceedings of the International Symposium on Shared Memory Multiprocessors, Tokyo, Japan, April 1991.
|
| |
21
|
|
| |
22
|
|
 |
23
|
Rémi Triolet , Francois Irigoin , Paul Feautrier, Direct parallelization of call statements, Proceedings of the 1986 SIGPLAN symposium on Compiler construction, p.176-185, June 25-27, 1986, Palo Alto, California, United States
|
| |
24
|
|
| |
25
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|