ACM Home Page
Please provide us with feedback. Feedback
MPI-aware compiler optimizations for improving communication-computation overlap
Full text PdfPdf (512 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
SESSION: High-performance communications II table of contents
Pages 316-325  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Anthony Danalis  University of Delaware, Newark, DE, USA
Lori Pollock  University of Delaware, Newark, DE, USA
Martin Swany  University of Delaware, Newark, DE, USA
John Cavazos  University of Delaware, Newark, DE, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 34,   Downloads (12 Months): 76,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542321
What is a DOI?

ABSTRACT

Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as a black box with unknown side effects and thus miss potential optimizations. This paper's contributions enable the development of an MPI-aware optimizing compiler that can perform transformations exploiting knowledge of MPI call effects to increase communication-computa-tion overlap. We formulate a set of data flow equations and rules to describe the side effects of key MPI functions so an MPI-aware compiler can automatically assess the safety of transformations. After categorizing existing compiler transformations based on their effect on the application code, we present an optimization algorithm that specifies when and how to apply these optimizing transformations to achieve improved communication-computation overlap. By manually applying the optimization algorithm to kernels extracted from HYCOM and the NAS benchmarks, we show that even when transforming these highly optimized codes, execution time can be decreased by an average of over 30%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Open64. http://open64.sourceforge.net.
 
2
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, December 1995.
 
3
C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Optimizing Bandwidth Limited Problems Using One-Sided Communication and Overlap. In 20th International Parallel & Distributed Processing Symposium (IPDPS), 2006.
 
4
 
5
 
6
E. P. Chassignet, L. T. Smith, G. R. Halliwell, and R. Bleck. North Atlantic simulation with the HYbrid Coordinate Ocean Model (HYCOM): Impact of the vertical coordinate choice, reference density, and thermobaricity. Journal of Physical Oceanography, 32:2504--2526, 2003.
7
 
8
 
9
Dale Shires and Lori Pollock and Sara Sprenkle. Program Flow Graph Construction for Static Analysis of MPI Programs. In Parallel and Distributed Processing Techniques and Applications (PDPTA'99), pages 1847--1853, June 1999.
 
10
 
11
 
12
A. Danalis, L. Pollock, and M. Swany. Automatic MPI application transformation with ASPhALT. In Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL 2007), in conjunction with IPDPS 2007, 2007.
 
13
A. Danalis, L. Pollock, M. Swany, and J. Cavazos. Implementing an Open64-based Tool for Improving the Performance of MPI Programs. In The Open64 Workshop, in conjunction with IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2008, Apr 2008.
 
14
D. Das, M. Gupta, R. Ravindran, W. Shivani, P. Sivakeshava, and R. Uppal. Compiler-Controlled Extraction of Computation-Communication Overlap in MPI Applications. In HIPS-POHLL joint Workshop on High-Level Parallel Programming Models and Supportive Environments and Performance Optimization for High-Level Languages and Libraries held in conjunction with the 22nd IEEE International Parallel & Distributed Processing Symposium(IPDPS 2008), April 2008.
 
15
T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC specification v. 1.1. http://upc.gwu.edu/documentation, 2003.
16
17
 
18
High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. CRPC-TR92225, Rice University, Houston, TX, 1993.
 
19
 
20
21
22
 
23
C. Iancu, P. Husbands, and W. Chen. Message Strip Mining Heuristics for High Speed Networks. In VECPAR, 2004.
24
 
25
K. Kennedy, B. Broom, K. Cooper, J. Dongarra, R. Fowler, D. Gannon, L. Johnsson, J. Mellor-Crummey, and L. Torczon. Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries. Journal of Parallel and Distributed Computing, 61(12):1803--1826, 2001.
 
26
 
27
 
28
29
30
 
31
32
 
33

Collaborative Colleagues:
Anthony Danalis: colleagues
Lori Pollock: colleagues
Martin Swany: colleagues
John Cavazos: colleagues