|
ABSTRACT
Many large scale applications have significant I/O requirements as well as computational and memory requirements. Unfortunately, the limited number of I/O nodes provided in a typical configuration of the modern message-passing distributed-memory architectures such as Intel Paragon and IBM SP-2 limits the I/O performance of these applications severely. In this paper, we examine some software optimization techniques and evaluate their effects in five different I/O-intensive codes from both small and large application domains. Our goals in this study are twofold. First, we want to understand the behavior of large-scale data-intensive applications and the impact of I/O subsystems on their performance and vice versa. Second, and more importantly, we strive to determine the solutions for improving the applications' performance by a mix of software techniques. Our results reveal that different applications can benefit from different optimizations. For example, we found that some applications benefit from file layout optimizations whereas others take advantage of collective I/O. A combination of architectural and software solutions is normally needed to obtain good I/O performance. For example, we show that with a limited number of I/O resources, it is possible to obtain good performance by using appropriate software optimizations. We also show that beyond a certain level, imbalance in the architecture results in performance degradation even when using optimized software, thereby indicating the necessity of an increase in I/O resources.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
R. Bennett, K. Bryant, A. Sussman, R. Das, and J. Saltz, "Jovian: A Framework for Optimizing Parallel I/O," Proc. 1994 Scalable Parallel Libraries Conf. 1994.
|
 |
3
|
Rajesh Bordawekar , Alok Choudhary , Ken Kennedy , Charles Koelbel , Michael Paleczny, A model and compilation strategy for out-of-core data parallel programs, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.1-10, July 19-21, 1995, Santa Barbara, California, United States
|
 |
4
|
Rajesh Bordawekar , Alok Choudhary , J. Ramanujam, Automatic optimization of communication in compiling out-of-core stencil codes, Proceedings of the 10th international conference on Supercomputing, p.366-373, May 25-28, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237578.237638]
|
| |
5
|
|
| |
6
|
P.H. Carns, W.B. Ligon Ill, R.B. Ross, and R. Thakur, "PVFS: A Parallel File System for Linux Clusters," Preprint /INL/MCS-P804-0400, submitted to the 2000 Extreme Linux Workshop April 2000.
|
 |
7
|
|
| |
8
|
A. Choudhary, R. Bordawekar, S. More, K. Sivaram, and R. Thakur, "The PASSION Runtime Library for the Intel Paragon," Proc. Intel Supercomputer User's Group Conf., June 1995.
|
| |
9
|
P. Corbett, D. Feitelson, S. Fineherg, Y. Hsu, B. Nitzherg, J, Prost, M. Snir, B. Traversal, and P. Wong, "Overview of the MPI-10 Parallel I/O Interface," Pro. Third Workshop I/O in Parallel unit Distributed Systems, Apr. 1995.
|
 |
10
|
Phyllis E. Crandall , Ruth A. Aydt , Andrew A. Chien , Daniel A. Reed, Input/output characteristics of scalable parallel applications, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.59-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224396]
|
| |
11
|
P. F. Corbett , D. G. Feltelson , J.-P. Prost , G. S. Almasi , S. J. Baylor , A. S. Bolmarcich , Y. Hsu , J. Satran , M. Snir , R. Colao , B. D. Herr , J. Kavaky , T. R. Morgan , A. Ziotek, Parallel file systems for the IBM SP computers, IBM Systems Journal, v.34 n.2, p.222-248, 1995
|
| |
12
|
J. Del Rosario, R. Bordawekar, and A. Choudhary, "Improved Parallel I/O via A Two-Phase Run-Time Access Strategy," Proc. 1993 IPPS Workshop Input/Output in Parallel Computer Sys tems, Apr. 1993.
|
| |
13
|
|
 |
14
|
|
 |
15
|
James V. Huber, Jr. , Andrew A. Chien , Christopher L. Elford , David S. Blumenthal , Daniel A. Reed, PPFS: a high performance portable parallel file system, Proceedings of the 9th international conference on Supercomputing, p.385-394, July 03-07, 1995, Barcelona, Spain
[doi> 10.1145/224538.224638]
|
| |
16
|
|
 |
17
|
Meenakshi A. Kandaswamy , Mahmut T. Kandemir , Alok N. Choudhary , David E. Bernholdt, Optimization and evaluation of Hartree-Fock application's I/O with PASSION, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-20, November 15-21, 1997, San Jose, CA
[doi> 10.1145/509593.509624]
|
| |
18
|
|
| |
19
|
|
 |
20
|
M. Kandemir , A. Choudhary , J. Ramanujam , M. Kandaswamy, A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations, Proceedings of the fifth workshop on I/O in parallel and distributed systems, p.79-92, November 17-17, 1997, San Jose, California, United States
[doi> 10.1145/266220.266228]
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
Todd C. Mowry , Angela K. Demke , Orran Krieger, Automatic compiler-inserted I/O prefetching for out-of-core applications, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.3-17, October 29-November 01, 1996, Seattle, Washington, United States
|
| |
25
|
"NWChem, A Computational Chemistry Package for Parallel Computers, Version 1. 1,"High Performance Computational Chemistry Group, Pacific Northwest Laboratory (PNL, 1995.
|
| |
26
|
M, Paleczny, K. Kennedy, and C. Koelbel, "Compiler Support for Out-of-Core Arrays on Parallel Machines,," CRI'C Technical Report 94509-S. Rice Univ., Houston l ex,, Dec. 1994,
|
| |
27
|
D. Reed, R. Aydt, R. Noe, P. Ruth, K. Shields, B. Schwartz, and L. Tavera, "Scalable Performance Analysis'. the Pablo Performance Analysis Environment," Proc. scalable Parallel Libraries Cool., pp. 104-113, 1993.
|
| |
28
|
B. Rullman Paragon Parallel File System, External Product Specification, Intel Supercomputer Systems Division. 1996.
|
 |
29
|
K. E. Seamons , Y. Chen , P. Jones , J. Jozwiak , M. Winslett, Server-directed collective I/O in Panda, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224371]
|
| |
30
|
F. Smirni, C. Elford, A, Laevry, D. Reed, and A. Chien, "Algorithmic Influences on I/O Access Patterns and Parallel File System Performance," Technical Report, Table Group, Univ. of Illinois at Urbana-Champaign, 1996.
|
| |
31
|
|
| |
32
|
|
| |
33
|
R. Thakur, W. Cropp, and E. Lusk, "A Case for Using MPI's Derived Data Types to Improve I/O Performance, Preprint, ANL/ MCS-P717-0598, Math. and Computer Science Division, Argonne Nat'l Laboratory, May 1998,
|
 |
34
|
Sivan Toledo , Fred G. Gustavson, The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations, Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference, p.28-40, May 27-27, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/236017.236029]
|
REVIEW
"John A. Fulcher : Reviewer"
The authors’ stated aim is twofold: first, to understand the behavior of large-scale, data intensive applications and the impact of I/O subsystems on their performance, and second, to improve the performance of such applications. The f
more...
|