|
ABSTRACT
We investigate the problem of evaluating Fortran 90-style array expressions on massively parallel distributed-memory machines. On such a machine, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression tree. The choice of where to perform the operation then affects this cost.We describe the communication cost of the parallel machine theoretically as a metric space; we model the alignment problem as that of finding a minimum-cost embedding of the expression tree into this space. We present algorithms based on dynamic programming that solve the embedding problem optimally for several communication cost metrics: multidimensional grids and rings, hypercubes, fat-trees, and the discrete metric. We also extend our approach to handle operations that change the shape of the arrays.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
2
|
AMERICAN NATIONAL STANDARDS INSTITUTE 1991. Fortran 90:X3J3 internal document $8.118 Submitted as Text for ANSI X3.198-1991, and ISO/IEC JTC1/SC22/WG5 internal document N692 Submitted as Text for ISO/IEC 1539:1991. American National Standards Institute, New York.
|
 |
3
|
|
| |
4
|
BIXBY, R., KENNEDY, K., AND KREMER, U. 1993. Automatic data layout using 0-1 integer programming. Tech. Rep. CRPC-TR93349-S (Nov.), Center for Research on Parallel Computation, Rice University, Houston, Tex.
|
| |
5
|
|
| |
6
|
|
| |
7
|
CHAPMAN, B., MEHROTRA, P., AND ZIMA, H. 1992. Programming in Vienna Fortran. Sci. Program. I, 1 (Fall), 31-50.
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
Siddhartha Chatterjee , John R. Gilbert , Robert Schreiber , Shang-Hua Teng, Automatic array alignment in data-parallel programs, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.16-28, March 1993, Charleston, South Carolina, United States
[doi> 10.1145/158511.158517]
|
| |
13
|
CHATTERJEE, S., GILBERT, J. R., SCHREIBER, R., AND TENG, S.-H. 1992. Optimal evaluation of array expressions on massively parallel machines. Tech. Rep. TR 92.17 (Sept.), Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffett Field, Calif. Also available as Xerox PARC Technical Report CSL-92-11. Available by anonymous ftp as RIACS-TR-92.17.PS from riacs.edu in the directory pub/Chatterjee.
|
 |
14
|
|
| |
15
|
DALLY, W. J. AND SEITZ, C. L. 1986. The torus routing chip. Distrib. Comput. 1, 187-196.
|
| |
16
|
FARRIS, J. S. 1970. Methods for computing Wagner trees. Syst. Zoolog. 19, 83-92.
|
| |
17
|
FITCH, W. M. 1971. Toward defining the course of evolution: Minimum change for a specific tree topology. Syst. Zoolog. 20, 406-416.
|
| |
18
|
Fox, G. C., HIRANANDANI, S., KENNEDY, K., KOELBEL, C., KREMER, U., TSt~NG, C.-W., AND Wu, M.-Y. 1990. Fortran D language specification. Tech. Rep. Rice COMP TR90-141 (Dec.), Dept. of Computer Science, Rice University, Houston, Tex.
|
| |
19
|
G ERNDT, M. 1989. Automatic parMlelization for distributed-memory multiprocessing systems. Ph. D. thesis, Univ. of Bonn, Bonn, Germany.
|
| |
20
|
|
| |
21
|
|
| |
22
|
HARTIGAN, J. A. 1973. Minimum mutation fits to a given tree. Biometrics 29, 53-65.
|
| |
23
|
HIGH PERFORMANCE FORTRAN FORUM 1993. High Performance Fortran language specification. Sc,. Program. 5, 1-2, 1-170.
|
| |
24
|
HIRANANDANI. S., KENNEDY, t~., AND TSENG, C.-W. 1991. Compiler support for machineindependent parallel programming in Fortran D. Tech. Prep. Rice COMP TR90-149 (Feb.), Dept. of Computer Science, Rice University, Houston, Tex.
|
| |
25
|
HWANG, F. K. 1986. A linear time algorithm for full Steiner trees. Oper. Res. Left. ~,, 235-237.
|
| |
26
|
KNOBE, K. AND DALLY, W. J. 1994. Subspace optimizations. In Automatic Parallehzat~on New Approaches to Code Ge'nerat~oa, Data D,str~but~on, arid Performance Prediction, C. \V. Kessler, Ed. Vieweg Advanced Studies in Computer Science, Verlag Vieweg, Wiesbaden, Germany, 153-176.
|
| |
27
|
KNOBE, K., LUKAS, J. D., AND DALLY, W. J. 1992. Dynamic alignment on distributed memory systems. In Proceedzngs of the 3rd Workshop on Compilers /or Parallel Computers. Austrian Center for Parallel Computation, 394-404.
|
| |
28
|
|
| |
29
|
KREMER, U., MELLOR-CRUMMEY, J., KENNEDY, K., AND CARLE, A. 1993. Automatic data layout for distributed-memory machines in the D programming environment. Tech. Rep. CRPC-TR93-298-S (Feb.), Center for Research on Parallel Computation, Rice University, Houston, Tex. Appears in Proceedings of the 1st International Workshop on Automatic Distributed Memory Parallelization, Automatic Data D~strzbut~on and Automatic Parallel Performance Predtct~on (AP'93), Vieweg Verlag, Wiesbaden, Germany.
|
| |
30
|
|
 |
31
|
Charles E. Leiserson , Zahi S. Abuhamdeh , David C. Douglas , Carl R. Feynman , Mahesh N. Ganmukhi , Jeffrey V. Hill , Daniel Hillis , Bradley C. Kuszmaul , Margaret A. St. Pierre , David S. Wells , Monica C. Wong , Shaw-Wen Yang , Robert Zak, The network architecture of the Connection Machine CM-5 (extended abstract), Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, p.272-285, June 29-July 01, 1992, San Diego, California, United States
[doi> 10.1145/140901.141883]
|
| |
32
|
|
| |
33
|
|
| |
34
|
ROSE, J. R. AND STEELE, G. L., JR. 1987. C*: An extended C language for data parallel programming. In Proceedzngs of the 2nd InternatzonaI Conference on Supercomputing. ACM, New York, 2-16.
|
| |
35
|
SANKOFF, D. AND R.OUSSEAU, P. 1975. Locating the vertices of a Steiner tree in an arbitrary metric space. Math. Program. 9, 240-246.
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
|
CITED BY 9
|
|
|
|
|
D. Cociorva , J. W. Wilkins , C. Lam , G. Baumgartner , J. Ramanujam , P. Sadayappan, Loop optimization for a class of memory-constrained computations, Proceedings of the 15th international conference on Supercomputing, p.103-113, June 2001, Sorrento, Italy
|
|
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
|
|
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.1
PROCESSOR ARCHITECTURES
C.1.2
Multiple Data Stream Architectures (Multiprocessors)
Subjects:
Interconnection architectures (e.g., common bus, multiport memory, crossbar switch)
Additional Classification:
C.
Computer Systems Organization
C.1
PROCESSOR ARCHITECTURES
C.1.2
Multiple Data Stream Architectures (Multiprocessors)
Subjects:
Multiple-instruction-stream, multiple-data-stream processors (MIMD);
Single-instruction-stream, multiple-data-stream processors (SIMD)
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Optimization;
Compilers
E.
Data
E.1
DATA STRUCTURES
Subjects:
Arrays
G.
Mathematics of Computing
G.2
DISCRETE MATHEMATICS
G.2.2
Graph Theory
Subjects:
Trees
General Terms:
Algorithms,
Languages,
Theory
Keywords:
Fortran 90,
array alignment,
compact dynamic programming,
data-parallel programming,
distributed memory parallel processors,
fixed topology Steiner tree
|