|
ABSTRACT
High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet efficient machine independent parallel programming model. Besides the algorithm selection, the data layout choice is the key intellectual step in writing an efficient HPF program. The developers of HPF did not believe that data layouts can be determined automatically in all cases, Therefore HPF requires the user to specify the data layout. It is the task of the HPF compiler to generate efficient code for the user supplied data layout. The choice of a good data layout depends on the HPF compiler used, the target architecture, the problem size, and the number of available processors. Allowing remapping of arrays at specific points in the program makes the selection of an efficient data layout even harder. Although finding an efficient data layout fully automatically may not be possible in all cases. HPF users will need support during the data layout selection process. In particular, this support is necessary if the user is not familiar with the characteristics of the target HPF compiler and target architecture, or even with HPF itself. Therefore, tools for automatic data layout and performance estimation will be crucial if the HPF is to find general acceptance in the scientific community. This paper discusses a framework for automatic data layout for use in a data layout assistant tool for a data-parallel language such as HPF. The envisioned tool can be used to generate a first data layout for a sequential Fortran program without data layout statements, or to extend a partially specified data layout in a HPF program to a totally specified data layout. Since the data layout assistant is not embedded in a compiler and will run only a few times during the tuning process of an application program, the framework can use techniques that may be too computationally expensive to be included in a compiler. A prototype data layout assistant tool based on our framework has been implemented as part of the D system currently under development at Rice University. The paper reports preliminary experimental results. The results indicate that the framework is efficient and generates data layouts of high quality.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
ACG+94
|
Vikram Adve , Alan Carle , Elana Granston , Seema Hiranandani , Ken Kennedy , Charles Koelbel , Ulrich Kremer , John Mellor-Crummey , Scott Warren , Chau-Wen Tseng, Requirements for Data-Parallel Programming Environments, IEEE Parallel & Distributed Technology: Systems & Technology, v.2 n.3, p.48-58, September 1994
[doi> 10.1109/M-PDT.1994.329801]
|
| |
AGG+94
|
Eduard Ayguadé , Jordi Garcia , Mercè Gironés , Jesús Labarta , Jordi Torres , Mateo Valero, Detecting and Using Affinity in an Automatic Data Distribution Tool, Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing, p.61-75, August 08-10, 1994
|
 |
AL93
|
|
| |
ASU86
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
BFKK91
|
Vasanth Balasundaram , Geoffrey Fox , Ken Kennedy , Ulrich Kremer, A static performance estimator to guide data partitioning decisions, Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, p.213-223, April 21-24, 1991, Williamsburg, Virginia, United States
|
| |
Bix92
|
R. Bixby. Implementing the Simplex method: The initial basis. ORSA Journal on Computing, 4(3), 1992.
|
| |
BKK+94a
|
|
| |
BKK94b
|
|
| |
CGS93
|
|
| |
CGSS94
|
|
 |
CGST93
|
Siddhartha Chatterjee , John R. Gilbert , Robert Schreiber , Shang-Hua Teng, Automatic array alignment in data-parallel programs, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.16-28, March 1993, Charleston, South Carolina, United States
[doi> 10.1145/158511.158517]
|
| |
FHK+90
|
G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu. Fortran D language specification. Technical Report TR90-141, Dept. of Computer Science, Rice University, December 1990.
|
| |
FJL+88
|
Geoffrey C. Fox , Mark A. Johnson , Gregory A. Lyzenga , Steve W. Otto , John K. Salmon , David W. Walker, Solving problems on concurrent processors. Vol. 1: General techniques and regular problems, Prentice-Hall, Inc., Upper Saddle River, NJ, 1988
|
| |
GAL95
|
J. Garcia, E. Ayguad~, and J. Labarta. A novel approach towards automatic data distribution. In Proceedings of the Workshop on Automatic Data Layout and Performance Prediction (AP'95), Houston, TX, April 1995.
|
| |
Gup92
|
|
 |
HA90
|
|
| |
Hec77
|
|
| |
Keß93
|
C.W. Ke~ler. Knowledge-based automatic parallelization by pattern recognition. In C.W. Ke~ler, editor, Automatic Parallelization -- New Approaches to Code Generation, Data Distribution, and Performance Prediction, pages 110-135. Verlag Vieweg, Wiesbaden, Germany, 1993.
|
| |
KLD92
|
K. Knobe, J.D. Lukas, and W.J. Dally. Dynamic alignment on distributed memory systems. In Proceedings of the Third Workshop on Compilers for Parallel Computers, Vienna, Austria, July 1992.
|
| |
KLS90
|
|
| |
Kre93
|
U. Kremer. NP-completeness of dynamic remapping. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993. Also available as technical report CRPC-TR93-330-S (D Newsletter #8), Rice University.
|
| |
Kre95
|
|
| |
LC90
|
J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In Frontiers90: The 3rd Symposium on the Frontiers of Massively Parallel Computation, College Park, MD, October 1990.
|
| |
LT93
|
P. Lee and T-B. Tsai. Compiling efficient programs for tightly-coupled distributed memory computers. In Proceedings of the 1993 International Conference on Parallel Processing, St. Charles, IL, August 1993.
|
| |
NDG95
|
|
| |
NW88
|
|
 |
Phi95
|
|
 |
RS89
|
|
| |
SSP+95
|
|
| |
Tse93
|
|
| |
Who91
|
|
CITED BY 21
|
|
|
|
|
|
|
|
Jordi Garcia , Eduard Ayguade , Jesus Labarta, Dynamic data distribution with control flow analysis, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.11-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
M. Kandemir , P. Banerjee , A. Choudhary , J. Ramanujam , E. Ayguadé, An integer linear programming approach for optimizing cache locality, Proceedings of the 13th international conference on Supercomputing, p.500-509, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rohit Chandra , Ding-Kai Chen , Robert Cox , Dror E. Maydan , Nenad Nedeljkovic , Jennifer M. Anderson, Data distribution support on distributed shared memory multiprocessors, ACM SIGPLAN Notices, v.32 n.5, p.334-345, May 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
|
|
|
|
|
|
|
|
|
|
|