|
ABSTRACT
The availability of large distributed memory multiprocessors and parallel computers with complex memory hierarchies have left the programmer with the difficult task of planning the detailed parallel execution of a program; for example, in the case of distributed memory machines, the programmer is forced to manually distribute code and data in addition to managing communication among tasks explicitly. Current work in compiler support for this task has focused on automating task partitioning, assuming a fixed data partition or a programmer-specified (in the form of annotations) data partition. This paper argues that one of the reasons for inefficient parallel execution is the lack of synergism between task partitioning and data partitioning and allocation; hence data and task allocation should both be influenced by the inherent dependence structure of the computation (which is the source of synergism). We present a methodology based on a unified approach to task and data partitioning; we show how to derive data and task partitions for computations expressed as nested loops that exhibit regular dependencies and how to map these onto distributed memory multiprocessors. Based on the mapping, we show how to derive code for the nodes of a distributed memory machine with appropriate message transmission constructs. We also discuss related communication optimizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
D. Callahan and K. Kennedy, "Compiling Programs for Distributed-Memory Multiprocessors," The Journal of Supercomputing, Vol. 2, Oct. 1988, pp. 151-169.
|
| |
4
|
|
| |
5
|
M. Chen, Y. Choo and J. Li, "Compiling Parallel Programs by Optimizing Performance," The Journal of Supercomputing, 2, 1988, pp. 171-207.
|
| |
6
|
|
| |
7
|
R. Cytron, "Limited Processor Scheduling of Doacross Loops," Proc. 1987 International Conference on Paral. lel Processing, S. Sahni (Ed.), Aug. 1987, pp. 226-234.
|
| |
8
|
J. Fortes and D. Moldown, "ParaUehsm Detection and Transformation Techniques Useful for VLSI Algorithms," Journal of Parallel and D~stributed Computing, Vol. 2, 1985, pp. 277-301.
|
| |
9
|
Geoffrey C. Fox , Mark A. Johnson , Gregory A. Lyzenga , Steve W. Otto , John K. Salmon , David W. Walker, Solving problems on concurrent processors. Vol. 1: General techniques and regular problems, Prentice-Hall, Inc., Upper Saddle River, NJ, 1988
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
B. Kernighan and S. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs," Bell Syslems Technical Journal, Vol. 49, No. 2, 1970, pp. 291-308.
|
| |
15
|
|
| |
16
|
C. Koelbel and P. Mehrotra, "Semi-automatic Process Decomposition for Non-shared Memory Machines," in Proc. 1989 Inlernational Conference on Supercomputing, May 1989.
|
 |
17
|
D. J. Kuck , R. H. Kuhn , D. A. Padua , B. Leasure , M. Wolfe, Dependence graphs and compiler optimizations, Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.207-218, January 26-28, 1981, Williamsburg, Virginia
[doi> 10.1145/567532.567555]
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
D. Padua, D. Kuck and D. Lawrie, "High-speed Multiprocessors and Compilation Techniques," IEEE Trans. Computers, Vol. C-29, No. 9, Sep. 1980, pp. 763-776.
|
 |
22
|
|
 |
23
|
|
| |
24
|
P. Sadayappan, F. Ercal ~nd J. Ramanujam, "Cluster Partitioning Approaches to Mapping Parallel Programs onto a Hypercube," to appear in Parallel Computing.
|
| |
25
|
C. Shen and W. Tsai, "A Graph Matching Approach to Optimal Task Assignment in Distributed Computing Systems Using a Minmax Criterion," IEEE Trans. on Computers, Vol. C-34, No. 3, Mar. 1985, pp. 197-203.
|
| |
26
|
B.J. Smith, "A Pipelined, Shared Resource MIMD Computer," Proc. 1978 International Conference on Parallel Processing, pp. 6-8.
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
CITED BY 14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Kulkarni , K. G. Kumar , A. Basu , A. Paulraj, Loop partitioning for distributed memory multiprocessors as unimodular transformations, Proceedings of the 5th international conference on Supercomputing, p.206-215, June 17-21, 1991, Cologne, West Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|