ACM Home Page
Please provide us with feedback. Feedback
Leveraging non-blocking collective communication in high-performance applications
Full text PdfPdf (93 KB)
Source
ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures table of contents
Munich, Germany
SESSION: Brief announcements table of contents
Pages 113-115  
Year of Publication: 2008
ISBN:978-1-59593-973-9
Authors
Torsten Hoefler  Indiana University, Bloomington, IN, USA
Peter Gottschling  Indiana University, Bloomington, IN, USA
Andrew Lumsdaine  Indiana University, Bloomington, IN, USA
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 56,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1378533.1378554
What is a DOI?

ABSTRACT

Although overlapping communication with computation is an important mechanism for achieving high performance in parallel programs, developing applications that actually achieve good overlap can be difficult. Existing approaches are typically based on manual or compiler-based transformations. This paper presents a pattern and library-based approach to optimizing collective communication in parallel high-performance applications, based on using non-blocking collective operations to enable overlapping of communication and computation. Common communication and computation patterns in iterative SPMD computations are used to motivate the transformations we present. Our approach provides the programmer with the capability to separately optimize communication and computation in an application, while automating the interaction between computation and communication to achieve maximum overlap. Performance results with a model application show more than a 90% decrease in communication overhead, resulting in 21% overall performance improvements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
 
5
 
6
T. Hoefler, P. Kambadur, R. L. Graham, G. Shipman, and A. Lumsdaine. A Case for Standard Non-Blocking Collective Operations. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, EuroPVM/MPI 2007, volume 4757, pages 125--134. Springer, 10 2007.
7
 
8
Kamil Iskra, Pete Beckman, Kazutomo Yoshii, and Susan Coghlan. The influence of operating systems on the performance of collective operations at extreme scale. In Proceedings of Cluster Computing, 2006 IEEE International Conference, 2006.
 
9
G. Liu and T.S. Abdelrahman. Computation communication overlap on network-of-workstation multiprocessors. In Proc. of the Int?l Conference on Parallel and Distributed Processing Techniques and Applications, pages 1635--1642, July 1998.
 
10
Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5--20, 2007 2007.
 
11
 
12
Jose Carlos Sancho, Kevin J. Barker, Darren J. Kerbyson, and Kei Davis. Mpi tools and performance studies-quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 125. ACM Press, 2006.


Collaborative Colleagues:
Torsten Hoefler: colleagues
Peter Gottschling: colleagues
Andrew Lumsdaine: colleagues