|
ABSTRACT
The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of MPI to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have been poorly studied at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. This paper extends this previous work by further qualifying the source of the performance advantage (offload, overlap, or independent progress).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015]
|
| |
2
|
R. Brightwell. A new MPI implementation for Cray SHMEM. Technical report, Sandia National Laboratories.
|
| |
3
|
R. Brightwell and K. Underwood. Evaluation of an eager protocol optimization for MPI. In Proceedings of EuroPVM/MPI, September 2003.
|
| |
4
|
R. Brightwell and K. D. Underwood. An analysis of NIC resource usage for offloading MPI. In Proceedings of the 2002 Workshop on Communication Architecture for Clusters, Santa Fe, NM, April 2004.
|
| |
5
|
R. Brightwell and K. D. Underwood. An initial analysis of the impact of overlap and independent progress for mpi. In submitted, 2004.
|
| |
6
|
|
| |
7
|
Cray Research, Inc. SHMEM Technical Note for C, SG-2516 2.3, October 1994.
|
| |
8
|
Infiniband Trade Association. http://www.innibandta.org, 1999.
|
| |
9
|
J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. K. Panda. Performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. In The International Conference for High Performance Computing and Communications (SC2003), November 2003.
|
| |
10
|
A. B. Maccabe, R. Riesen, and D. W. van Dresser. Dynamic processor modes in Puma. Bulletin of the Technical Committee on Operating Systems and Application Environments (TCOS), 8(2):4--12, 1996.
|
| |
11
|
|
| |
12
|
|
| |
13
|
L. Shuler, C. Jong, R. Riesen, D. van Dresser, A. B. Maccabe, L. A. Fisk, and T. M. Stallcup. The Puma operating system for massively parallel computers. In Proceeding of the 1995 Intel Supercomputer User's Group Conference. Intel Supercomputer User's Group, 1995.
|
| |
14
|
|
| |
15
|
K. D. Underwood and R. Brightwell. The impact of mpi queue usage on mpi latency. In submitted, 2004.
|
| |
16
|
|
 |
17
|
Frederick C. Wong , Richard P. Martin , Remzi H. Arpaci-Dusseau , David E. Culler, Architectural requirements and scalability of the NAS parallel benchmarks, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.41-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331573]
|
CITED BY 5
|
|
|
|
Sayantan Sur , Hyun-Wook Jin , Lei Chai , Dhabaleswar K. Panda, RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
|