|
ABSTRACT
Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this article, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and it works on structured programs with conditional statements and nested loops but without arbitrary goto statements.The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distribu-tions, and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing, and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (anaverage of 32% reduction), the volume of the data communicated (an average of 37%reduction), and the execution time (an average of 26% reduction).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
ADVE, V., MELLOR-CRUMMEY, J., AND SETHI, A. 1997. An integer set framework for HPF analysis and code generation. Technical Report TR97-275, Computer Science Dept., Rice University.
|
| |
3
|
Vikram Adve , John Mellor-Crummey, Advanced code generation for high performance Fortran, Compiler optimizations for scalable parallel systems: languages, compilation techniques, and run time systems, Springer-Verlag New York, Inc., New York, NY, 2001
|
| |
4
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
BALASUNDARAM, V., FOX, G., KENNEDY, K., AND K_REMER, U. 1990. An interactive environment for data partitioning and distribution. In 5th Distributed Memory Computing Conference, Charleston, SC.
|
| |
9
|
|
| |
10
|
Prithviraj Banerjee , John A. Chandy , Manish Gupta , Eugene W. Hodges IV , John G. Holm , Antonio Lain , Daniel J. Palermo , Shankar Ramaswamy , Ernesto Su, The Paradigm Compiler for Distributed-Memory Multicomputers, Computer, v.28 n.10, p.37-47, October 1995
[doi> 10.1109/2.467577]
|
| |
11
|
|
| |
12
|
|
 |
13
|
Soumen Chakrabarti , Manish Gupta , Jong-Deok Choi, Global communication analysis and optimization, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.68-78, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
14
|
Siddhartha Chatterjee , John R. Gilbert , Fred J. E. Long , Robert Schreiber , Shang-Hua Teng, Generating local addresses and communication sets for data-parallel programs, Journal of Parallel and Distributed Computing, v.26 n.1, p.72-84, April 1, 1995
[doi> 10.1006/jpdc.1995.1049]
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
GONG, C., GUPTA, R., AND MELHEM, R. 1993. Compilation techniques for optimizing communication on distributed-memory systems. In Proc. International Conference on Parallel Processing, Volume II, pages 39-46, St. Charles, IL.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
Manish Gupta , Sam Midkiff , Edith Schonberg , Ven Seshadri , David Shields , Ko-Yang Wang , Wai-Mee Ching , Ton Ngo, An HPF compiler for the IBM SP2, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.71-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224422]
|
| |
23
|
|
| |
24
|
|
| |
25
|
M. W. Hall , S. Hiranandani , K. Kennedy , C.-W. Tseng, Interprocedural compilation of Fortran D for MIMD distributed-memory machines, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.522-534, November 16-20, 1992, Minneapolis, Minnesota, United States
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
Wayne Kelly , Vadim Maslov , William Pugh , Evan Rosser , Tatiana Shpeisman , David Wonnacott, The Omega Library interface guide, University of Maryland at College Park, College Park, MD, 1995
|
| |
30
|
|
| |
31
|
KENNEDY, K. AND SETHI, A. 1995. A constrained-based communication placement framework, Technical Report CRPC-TR95515-S, CRPC, Rice University.
|
| |
32
|
|
| |
33
|
|
 |
34
|
Ken Kennedy , Nenad Nedeljkovic , Ajay Sethi, A linear-time algorithm for computing the memory access sequence in data-parallel programs, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.102-111, July 19-21, 1995, Santa Barbara, California, United States
|
| |
35
|
KENNEDY, K., NEDELJKOVIC, N., AND SETHI, A. 1996. Communication generation for cyclic(k) distributions. In Languages, Compilers, and Run-Time Systems for Scalable Computers, B. Szymanski and B. Sinharoy (Eds.), Chapter 14, Kluwer Academic Publishers.
|
 |
36
|
|
| |
37
|
Charles H. Koelbel , David B. Loveman , Robert S. Schreiber , Guy L. Steele, Jr. , Mary E. Zosel, The high performance Fortran handbook, MIT Press, Cambridge, MA, 1994
|
| |
38
|
PALERMO, D. g., Su, E., CHANDY, g. A., AND BANERJEE, P. 1994. Communication optimizations used in the PARADIGM compiler for distributed-memory multicomputers. In Proc. International Conference on Parallel Processing.
|
| |
39
|
POLYCHRONOPOULOS, C., GIRKAR, M. B., HAGHIGHAT, M. R., LEE, C. L., LEUNG, B. P., AND SCHOUTEN, D.A. 1989. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. In Proc. the International Conference on Parallel Processing, St. Charles IL, August 1989, pages II 39-48.
|
 |
40
|
|
 |
41
|
Ernesto Su , Antonio Lain , Shankar Ramaswamy , Daniel J. Palermo , Eugene W. Hodges, IV , Prithviraj Banerjee, Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers, Proceedings of the 9th international conference on Supercomputing, p.424-433, July 03-07, 1995, Barcelona, Spain
[doi> 10.1145/224538.224650]
|
| |
42
|
|
| |
43
|
TARJAN, R. E. 1974. Testing flow graph reducibility. Journal of Computer and System Sciences, 9:355-365.
|
| |
44
|
|
| |
45
|
VAN HANXLEDEN, R. AND KENNEDY, K. 1993. A code placement framework and its application to communication generation. Technical Report CRPC-TR93337-S, CRPC, Rice University.
|
 |
46
|
|
| |
47
|
VENKATACHAR, A., RAMANUJAM, J., AND THIRUMALAI, A. 1997. Communication generation for block-cyclic distributions. Parallel Processing Letters, 7, 2, 195-202, June.
|
| |
48
|
|
| |
49
|
|
| |
50
|
YUAN, X., GUPTA, R., AND MELHEM, R. 1997b. Demand-driven data flow analysis for communication optimization. Parallel Processing Letters, 7, 4, 359-370, December.
|
| |
51
|
|
 |
52
|
|
|