|
ABSTRACT
The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. After the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. If languages such as HPF are to find general acceptance, the need for data layout selection support has to be addressed. We beleive that the appropriate way to provide the needed support is through a tool that generates data layout specifications automatically. This article discusses the design and implementation of a data layout selection tool that generates HPF-style data layout specifications automatically. Because layout is done in a tool that is not embedded in the target compiler and hence will be run only a few times during the tuning phase of an application, it can use techniques such as integer programming that may be considered too computationally expensive for inclusion in production compilers. The proposed framework for automatic data layout selection builds and examines search spaces of candidate data layouts. A candidate layout is an efficient layout for some part of the program. After the generation of search spaces, a single candidate layout is selected for each program part, resulting in a data layout for the entire program. A good overall data layout may require the remapping of arrays between program parts. A performance estimator based on a compiler model, an execution model, and a machine model are needed to predict the execution time of each candidate layout and the costs of possible remappings between candidate data layouts. In the proposed framework, instances of NP-complete problems are solved during the construction of candidate layout search spaces and the final selection of candidate layouts from each search space. Rather than resorting to heuristics, the framework capitalizes on state-of-the-art 0-1 integer programming technology to compute optimal solutions of these NP-complete problems. A prototype data layout assistant tool based on our framework has been implemented as part of the D system currently under development at Rice University. The article reports preliminary experimental results. The results indicate that the framework is efficient and allows the generation of data layouts of high quality.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Vikram Adve , Alan Carle , Elana Granston , Seema Hiranandani , Ken Kennedy , Charles Koelbel , Ulrich Kremer , John Mellor-Crummey , Scott Warren , Chau-Wen Tseng, Requirements for Data-Parallel Programming Environments, IEEE Parallel & Distributed Technology: Systems & Technology, v.2 n.3, p.48-58, September 1994
[doi> 10.1109/M-PDT.1994.329801]
|
| |
2
|
Vikram Adve , John Mellor-Crummey, Advanced code generation for high performance Fortran, Compiler optimizations for scalable parallel systems: languages, compilation techniques, and run time systems, Springer-Verlag New York, Inc., New York, NY, 2001
|
| |
3
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
4
|
Erik R. Altman , R. Govindarajan , Guang R. Gao, Scheduling and mapping: software pipelining in the presence of structural hazards, Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation, p.139-150, June 18-21, 1995, La Jolla, California, United States
|
 |
5
|
R. Govindarajan , Erik R. Altman , Guang R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proceedings of the 27th annual international symposium on Microarchitecture, p.85-94, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192733]
|
| |
6
|
|
 |
7
|
|
| |
8
|
Eduard Ayguadé , Jordi Garcia , Mercè Gironés , Jesús Labarta , Jordi Torres , Mateo Valero, Detecting and Using Affinity in an Automatic Data Distribution Tool, Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing, p.61-75, August 08-10, 1994
|
 |
9
|
Vasanth Balasundaram , Geoffrey Fox , Ken Kennedy , Ulrich Kremer, A static performance estimator to guide data partitioning decisions, Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming, p.213-223, April 21-24, 1991, Williamsburg, Virginia, United States
|
| |
10
|
|
| |
11
|
BIXBY, R. 1992. Implementing the Simplex method: The initial basis. ORSA J. Comput. 4, 3, 267-284.
|
| |
12
|
|
| |
13
|
CALLAHAN, D. AND KENNEDY, K. 1988. Compiling programs for distributed-memory multiprocessots. J. Supercomput. 2, 151-169.
|
| |
14
|
|
 |
15
|
Siddhartha Chatterjee , John R. Gilbert , Robert Schreiber , Shang-Hua Teng, Automatic array alignment in data-parallel programs, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.16-28, March 1993, Charleston, South Carolina, United States
[doi> 10.1145/158511.158517]
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
D'HOLLANDER, E. 1989. Partitioning and labeling of index sets in do loops with constant dependence. In Proceedings of the 1989 International Conference on Parallel Processing. II139-II144.
|
| |
21
|
|
| |
22
|
|
| |
23
|
Fox, G., HIRANANDANI, S., KENNEDY, K., KOELBEL, C., KREMER, W., TSENG, C., AND WU, M. 1990. Fortran D language specification. Tech. Rep. TR90-141, Dept. of Computer Science, Rice University, Houston, Tex.
|
| |
24
|
Geoffrey C. Fox , Mark A. Johnson , Gregory A. Lyzenga , Steve W. Otto , John K. Salmon , David W. Walker, Solving problems on concurrent processors. Vol. 1: General techniques and regular problems, Prentice-Hall, Inc., Upper Saddle River, NJ, 1988
|
| |
25
|
GARCIA, J. 1997. Automatic data distribution for massively parallel processors. Ph.D. thesis, Universitat Polit~cnica de Catalunya, Barcelona, Spain.
|
| |
26
|
GARCIA, J., AYGUADI~, E., AND LABARTA, J. 1995. A novel approach towards automatic data distribution. In Proceedings of the Workshop on Automatic Data Layout and Performance Prediction (AP'95). Available as Rice University Tech. Rep. CRPC-TR95-548.
|
| |
27
|
Jordi Garcia , Eduard Ayguade , Jesus Labarta, Dynamic data distribution with control flow analysis, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.11-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/369028.369048]
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
 |
33
|
|
 |
34
|
|
 |
35
|
|
| |
36
|
|
| |
37
|
|
| |
38
|
Charles H. Koelbel , David B. Loveman , Robert S. Schreiber , Guy L. Steele, Jr. , Mary E. Zosel, The high performance Fortran handbook, MIT Press, Cambridge, MA, 1994
|
| |
39
|
|
| |
40
|
|
| |
41
|
KREMER, U. 1997. Optimal and near-optimM solutions for hard compilation problems. Parallel Process. Lett. 7, 2, 371-378.
|
| |
42
|
|
| |
43
|
KREMER, U. AND RAMIe, M. 1994. Compositional oil reservoir simulation in Fortran D" A feasibility study on Intel iPSC,/860. Int. J. Supercomput. Appl. 8, 2 (Summer), 119-128.
|
| |
44
|
LEE, P. AND TSAI, T.-B. 1993. Compiling efficient programs for tightly-coupled distributed mereory computers. In Proceedings of the 1993 International Conference on Parallel Processing. II161-II165.
|
| |
45
|
|
| |
46
|
|
| |
47
|
|
| |
48
|
|
| |
49
|
|
 |
50
|
|
| |
51
|
|
| |
52
|
PALERMO, D. AND BANERJEE, P. 1995. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Technical Report CRHC-95-09, Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Ill.
|
| |
53
|
|
 |
54
|
|
| |
55
|
PRESBERG, D. L. 1996. Comparison of 3 HPF compilers for the IBM SP. NHSF, Rev. 1996, 2. Available at http://www.crpc.rice.edu/NHSEreview/96-2.html.
|
 |
56
|
|
 |
57
|
|
 |
58
|
|
| |
59
|
ROSE, J. AND STEELE, G. 1987. C*: An extended C language for data parallel programming. Technical Report PL87-5, Thinking Machines, Inc., Cambridge, Mass.
|
| |
60
|
|
| |
61
|
|
| |
62
|
Thinking Machines Corp. 1989. CM Fortran Reference Manual, Version 5.2-0.6 ed. Thinking Machines Corp., Cambridge, Mass.
|
| |
63
|
|
| |
64
|
|
 |
65
|
|
| |
66
|
ZIMA, H., BAST, H.-J., AND GERNDT, M. 1988. SUPERB: A tool for semi-automatic MIMD,/SIMD parallelization. Parallel Comput. 6, 1-18.
|
CITED BY 26
|
|
|
|
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
|
|
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Robert Springer , David K. Lowenthal , Barry Rountree , Vincent W. Freeh, Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vincent W. Freeh , David K. Lowenthal , Feng Pan , Nandini Kappiah , Rob Springer , Barry L. Rountree , Mark E. Femal, Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications, IEEE Transactions on Parallel and Distributed Systems, v.18 n.6, p.835-848, June 2007
|
|
|
|
|
|
|
ADDITIONAL RESOURCES
Appendices A-C do not appear in the print version of this
article. They are contained in the online version of this article
and are also available in a separate online document.
REVIEW
"Michael Wolfe : Reviewer"
In High Performance Fortran and similar languages, the user is
responsible for adding explicit data distribution directives. Since this
is a major impediment to the widespread use of such languages, recent
research has studied automatic data d
more...
|