|
ABSTRACT
Many scientific applications use parallel I/O to meet the low latency and high bandwidth I/O requirement. Among many available parallel I/O operations, collective I/O is one of the most popular methods when the storage layouts and access patterns of data do not match. The implementation of collective I/O typically involves disk I/O operations followed by interprocessor communications. Also, in many I/O-intensive applications, parallel I/O operations are usually followed by parallel computations. This paper presents a comparative study of different overlap strategies in parallel applications. We have experimented with four different overlap strategies 1) Overlapping I/O and communication; 2) Overlapping I/O and computation; 3) Overlapping computation and communication; and 4) Overlapping I/O, communication, and computation. All experiments have been conducted on a Linux Cluster and the performance results obtained are very encouraging. On an average, we have enhanced the performance of a generic collective read call by 38%, the MxM benchmark by 26%, and the FFT benchmark by 34%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Caglar, Benson, Huang, and Chu. Usfmpi: A multi-threaded implementation of mpi for linux clusters. In Proc's of the 15th Inter. Conf. on Paral. and Dist. Comp. and Sys., pages 92--103, 2003.
|
| |
2
|
Philip H. Carns , Walter B. Ligon, III , Robert B. Ross , Rajeev Thakur, PVFS: a parallel file system for linux clusters, Proceedings of the 4th annual Linux Showcase & Conference, p.28-28, October 10-14, 2000, Atlanta, Georgia
|
| |
3
|
Caron, Desprez, and Suter. Overlapping computations and communications with i/o in wavefront algorithms. Technical Report RR-5410, Institut National de Recherche en Informatique et en Automatique (INRIA), 2004.
|
| |
4
|
Choudhary, Bordawekar, More, and Sivaram. Passion runtime library for the intel paragon. In Proc's of the Intel Supercomputer User's Group Conf., pages 119--128, 1995.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
Foster, Kesselman, and Tuecke. The nexus task-parallel runtime system. In Proc's of the 1st Inter. Workshop on Paral. Proc., 1994.
|
| |
10
|
Gropp and Thakur. Issues in developing a thread-safe mpi implementation. In Proc's of the 13th European PVM/MPI Users' Group Meeting, volume 4192, pages 12--21, 2006.
|
| |
11
|
Hoefler, Squyres, Rehm, and Lumsdaine. A case for non-blocking collective operations. In Book Frontiers of High Perf. Comp. and Networking ISPA Workshops, volume 4331, pages 155--164, 2006.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Krempel. Tracing the connections between mpi-io calls and their corresponding pvfs2 disk operations. Bachelor's thesis, Ruprecht-Karls Universitt Heidelberg, 2006.
|
| |
17
|
Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface. 1997.
|
| |
18
|
|
 |
19
|
Christina M. Patrick , Seung Woo Son , Mahmut Taylan Kandemir, Enhancing the performance of MPI-IO applications by overlapping I/O, computation and communication, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345254]
|
| |
20
|
Ross, Thakur, and Choudhary. Achievements and challenges for i/o in computational science. J. of Physics: Conf. Series, 16:501--509, 2005.
|
 |
21
|
|
 |
22
|
K. E. Seamons , Y. Chen , P. Jones , J. Jozwiak , M. Winslett, Server-directed collective I/O in Panda, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224371]
|
 |
23
|
Sayantan Sur , Hyun-Wook Jin , Lei Chai , Dhabaleswar K. Panda, RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122978]
|
| |
24
|
|
 |
25
|
Rajeev Thakur , William Gropp , Ewing Lusk, On implementing MPI-IO portably and with high performance, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.23-32, May 05-05, 1999, Atlanta, Georgia, United States
[doi> 10.1145/301816.301826]
|
| |
26
|
|
| |
27
|
Thakur, Lusk, and Gropp. I/o in parallel applications: The weakest link. The Inter. J. of High Perf. Comp. Appls., 12(4):389--395, 1998.
|
| |
28
|
Thakur, Lusk, and Gropp. Users guide for romio: A high-performance, portable mpi-io implementation, 2002.
|
| |
29
|
Tsujita. Effective nonblocking mpi-i/o in remote i/o operations using a multithreaded mechanism. Technical report, 2004
|
CITED BY
|
|
Hasan Abbasi , Matthew Wolf , Greg Eisenhauer , Scott Klasky , Karsten Schwan , Fang Zheng, DataStager: scalable data staging services for petascale applications, Proceedings of the 18th ACM international symposium on High performance distributed computing, June 11-13, 2009, Garching, Germany
|
|