|
ABSTRACT
As scientists expand their models to describe physical phenomena of increasingly large extent, I/O becomes crucial and a system with limited I/O capacity can severely constrain the performance of the entire program.We provide experimental results, performed on an lntel Touchtone Delta and nCUBE 2 I/O system, to show that the performance of existing parallel I/O systems can vary by several orders of magnitude as a function of the data access pattern of the parallel program. We then propose a two-phase access strategy, to be implemented in a runtime system, in which the data distribution on computational nodes is decoupled from storage distribution. Our experimental results show that performance improvements of several orders of magnitude over direct access based data distribution methods can be obtained, and that performance for most data access patterns can be improved to within a factor of 2 of the best performance. Further, the cost of redistribution is a very small fraction of the overall access cost.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[DEBE91] E. DeBenedictis, P. Madams, "nCUBE's Parallel I/O with Unix Compatibility", Proceedings of the Sixth Distributed Memory Computing Conference, Portland, Oregon, April, 1991, pp. 270-277.
|
| |
2
|
[DELR92] J.M. del Rosario, "High Performance Parallel I/O on the nCUBE 2", IEICE transactions, Japan, August, 1992.
|
| |
3
|
|
| |
4
|
[HPCC91] "Grand Challenges: High Performance Computing and Communications", A Report by the Committee on Physical, Mathematical, and Engineering Sciences, Office of Science and Technology Policy, 1991.
|
| |
5
|
[INTE92] Intel, "A Touchstone Delta System Description", Intel, Portland, Oregon, 1992. Intel Advanced Information.
|
| |
6
|
[NCUB92] nCUBE, "nCUBE 2 Systems: Technical Overview", nCUBE, Foster City, California, 1992.
|
| |
7
|
[PIER89] P. Pierce, "A concurrent file system for a highly parallel mass storage subsystem", Proceedings of the Fourth Conference on Hypercubes, Concurrent Computers, and Applications, pp. 161-6, 1989.
|
CITED BY 24
|
|
Jens Mache , Virginia Lo , Marilynn Livingston , Sharad Garg, The impact of spatial layout of jobs on parallel I/O performance, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.45-56, May 05-05, 1999, Atlanta, Georgia, United States
|
|
|
Jason A. Moore , Philip J. Hatcher , Michael J. Quinn, Efficient data-parallel files via automatic mode detection, Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference, p.1-14, May 27-27, 1996, Philadelphia, Pennsylvania, United States
|
|
|
Apratim Purakayastha , Carla Schlatter Ellis , David Kotz, ENWRICH: a compute-processor write caching scheme for parallel file systems, Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference, p.55-68, May 27-27, 1996, Philadelphia, Pennsylvania, United States
|
|
|
Judy Sturtevant , Mark Christon , Philip D. Heermann , Pang-Chieh Chen, PDS/PIO: lightweight libraries for collective parallel I/O, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), p.1-11, November 07-13, 1998, San Jose, CA
|
|
|
|
|
|
|
|
|
Rajeev Thakur , William Gropp , Ewing Lusk, On implementing MPI-IO portably and with high performance, Proceedings of the sixth workshop on I/O in parallel and distributed systems, p.23-32, May 05-05, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
Jack Dongarra , Ian Foster , Geoffrey Fox , William Gropp , Ken Kennedy , Linda Torczon , Andy White, References, Sourcebook of parallel computing, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2003
|
|
|
Tara M. Madhyastha , Garth A. Gibson , Christos Faloutsos, Informed prefetching of collective input/output requests, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.13-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
Jonghyun Lee , Xiaosong Ma , Marianne Winslett , Shengke Yu, Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Avery Ching , Wei-keng Liao , Alok Choudhary , Robert Ross , Lee Ward, Noncontiguous locking techniques for parallel file systems, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
P. F. Corbett , D. G. Feltelson , J.-P. Prost , G. S. Almasi , S. J. Baylor , A. S. Bolmarcich , Y. Hsu , J. Satran , M. Snir , R. Colao , B. D. Herr , J. Kavaky , T. R. Morgan , A. Ziotek, Parallel file systems for the IBM SP computers, IBM Systems Journal, v.34 n.2, p.222-248, 1995
|
|