|
ABSTRACT
It is widely known that MPI-IO performs poorly in a Lustre file system environment, although the reasons for such performance are currently not well understood. The research presented in this paper strongly supports our hypothesis that MPI-IO performs poorly in this environment because of the fundamental assumptions upon which most parallel I/O optimizations are based. In particular, it is almost universally believed that parallel I/O performance is optimized when aggregator processes perform large, contiguous I/O operations in parallel. Our research shows that this approach generally provides the worst performance in a Lustre environment, and that the best performance is often obtained when the aggregator processes perform a large number of small, non-contiguous I/O operations. In this paper, we first demonstrate and explain these non-intuitive results. We then present a user-level library, termed Y-lib, which redistributes data in a way that conforms much more closely with the Lustre storage architecture than does the data redistribution pattern employed by MPI-IO. We then provide experimental results showing that Y-lib can increase performance between 300% and 1000% depending on the number of aggregator processes and file size. Finally, we cause MPI-IO itself to use our data redistribution scheme, and show that doing so results in an increase in performance of a similar magnitude when compared to the current MPI-IO data redistribution algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Cluster File Systems, Inc., http://www.clustrefs.com
|
| |
2
|
Frequently Asked Questions., http://www.clusterfs.com/faq.html
|
| |
3
|
I/O Performance Project http://wiki.lustre.org/index.php?title=IOPerformanceProject
|
| |
4
|
Lustre: scalable, secure, robust, highly-available cluster file system. An offshoot of AFS, CODA, and Ext2. www.lustre.org/
|
| |
5
|
MPI-2: Extensions to the Message-Passing Interface. Message Passing Interface Forum http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html
|
| |
6
|
MPICH2 Home Page, http://www.mcs.anl.gov/mpi/mpich
|
| |
7
|
The Panasas Home Page, http://www.panasas.com
|
| |
8
|
Avery Ching , Alok Choudhary , Kenin Coloma , Wei-keng Liao , Robert Ross , William Gropp, Noncontiguous I/O Accesses Through MPI-IO, Proceedings of the 3st International Symposium on Cluster Computing and the Grid, p.104, May 12-15, 2003
|
| |
9
|
Bramm, P.J. The Lustre Storage Architecture, White Paper, Cluster File Systems, Inc., Oct, Vol. 23 (2003)
|
| |
10
|
Isaila, F. and Tichy, W.F., View I/O: improving the performance of non-contiguous I/O. In the Proceedings of the IEEE Cluster Computing Conference, (Hong Kong).
|
| |
11
|
Larkin, J. and Fahey, M. Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 CUG 2007, 2007.
|
| |
12
|
Liao, W.-k., Ching, A., Coloma, K., Choudhary, A., et al., Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method. In the Proceedings of the Next Generation Software (NGS) Workshop, (2007).
|
| |
13
|
Liao, W.-k., Ching, A., Coloma, K., Choudhary, A., et al., An Implementation and Evaluation of Client-Side File Caching for MPI-IO. In the Proceedings of the International Parallel and Distried Processing Symposium (IPDPS '07), (2007).
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
Rajeev Thakur , William Gropp , Ewing Lusk, On implementing MPI-IO portably and with high performance, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.23-32, May 05-05, 1999, Atlanta, Georgia, United States
[doi> 10.1145/301816.301826]
|
| |
18
|
|
| |
19
|
Thakur, R., Ross, R. and Gropp, W. Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation, Technical Memorandum ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory, Revised May 2004.
|
| |
20
|
|
|