|
ABSTRACT
Known challenges for petascale machines are that (1) the costs of I/O for high performance applications can be substantial, especially for output tasks like checkpointing, and (2) noise from I/O actions can inject undesirable delays into the runtimes of such codes on individual compute nodes. This paper introduces the flexible 'DataStager' framework for data staging and alternative services within that jointly address (1) and (2). Data staging services moving output data from compute nodes to staging or I/O nodes prior to storage are used to reduce I/O overheads on applications' total processing times, and explicit management of data staging offers reduced perturbation when extracting output data from a petascale machine's compute partition. Experimental evaluations of DataStager on the Cray XT machine at Oak Ridge National Laboratory establish both the necessity of intelligent data staging and the high performance of our approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
N. Ali and M. Lauria. Improving the performance of remote i/o using asynchronous primitives. High Performance Distributed Computing, 2006 15th IEEE International Symposium on, pages 218--228, 0-0 0.
|
| |
3
|
P. Beckman and S. Coghlan. ZeptoOS: the small Linux for big computers, 2005.
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
R. Brightwell, T. Hudson, R. Riesen, and A. B. Maccabe. The Portals 3.0 message passing interface. Technical report SAND99-2959, Sandia National Laboratories, December 1999.
|
| |
8
|
|
| |
9
|
Fabian E. Bustamante , Greg Eisenhauer , Karsten Schwan , Patrick Widener, Efficient wire formats for high performance computing, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.39-es, November 04-10, 2000, Dallas, Texas, United States
|
 |
10
|
Zhongtang Cai , Greg Eisenhauer , Qi He , Vibhore Kumar , Karsten Schwan , Matthew Wolf, IQ-services: network-aware middleware for interactive large-data applications, Proceedings of the 2nd workshop on Middleware for grid computing, p.11-16, October 18-22, 2004, Toronto, Ontario, Canada
[doi> 10.1145/1028493.1028495]
|
| |
11
|
Philip H. Carns , Walter B. Ligon, III , Robert B. Ross , Rajeev Thakur, PVFS: a parallel file system for linux clusters, Proceedings of the 4th annual Linux Showcase & Conference, p.28-28, October 10-14, 2000, Atlanta, Georgia
|
| |
12
|
Lustre: A scalable, high-performance file system. Cluster File Systems Inc. white paper, version 1.0, November 2002. http://www.lustre.org/docs/whitepaper.pdf.
|
| |
13
|
|
| |
14
|
C. Ding, S. Dwarkadas, M. Huang, K. Shen, and J. Carter. Program phase detection and exploitation. Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pages 8 pp.-, 25--29 April 2006.
|
| |
15
|
C. Docan, M. Parashar, and S. Klasky. High speed asynchronous data transfers on the cray xt3. In Cray User Group Conference, 2007.
|
| |
16
|
G. Eisenhauer. The evpath library. http://www.cc.gatech.edu/systems/projects/EVPath.
|
| |
17
|
G. Eisenhauer. Portable binary input/output. http://www.cc.gatech.edu/systems/projects/PBIO.
|
| |
18
|
|
 |
19
|
Mark K. Gardner , Wu-chun Feng , Jeremy Archuleta , Heshan Lin , Xiaosong Mal, Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188564]
|
 |
20
|
|
| |
21
|
R. Jain, K. K. Ramakrishnan, and D. M. Chiu. Congestion avoidance in computer networks with a connectionless network layer. Technical Report DEC-TR-506, Digital Equipment Corporation, MA, Aug. 1987.
|
 |
22
|
|
| |
23
|
R. Latham, N. Miller, R. Ross, and P. Carns. A next-generation parallel file system for linux clusters. LinuxWorld, 2(1), January 2004.
|
 |
24
|
Jay F. Lofstead , Scott Klasky , Karsten Schwan , Norbert Podhorszki , Chen Jin, Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS), Proceedings of the 6th international workshop on Challenges of large applications in distributed environments, June 23-23, 2008, Boston, MA, USA
[doi> 10.1145/1383529.1383533]
|
 |
25
|
|
| |
26
|
|
| |
27
|
R. A. Oldfield, A. B. Maccabe, S. Arunagiri, T. Kordenbrock, R. R. sen, L. Ward, and P. Widener. Lightweight I/O for Scientific Applications. In Proc. 2006 IEEE Conference on Cluster Computing, Barcelona, Spain, September 2006.
|
| |
28
|
R. A. Oldfield, P. Widener, A. B. Maccabe, L. Ward, and T. Kordenbrock. Efficient Data Movement for Lightweight I/O. In Proc. 2006 Workshop on high-performance I/O techniques and deployment of Very-Large Scale I/ O Systems (HiPerI/O 2006), Barcelona, Spain, September 2006.
|
| |
29
|
Leonid Oliker , Jonathan Carter , Michael Wehner , Andrew Canning , Stephane Ethier , Art Mirin , David Parks , Patrick Worley , Shigemune Kitawaki , Yoshinori Tsuda, Leading Computational Methods on Scalar and Vector HEC Platforms, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.62, November 12-18, 2005
[doi> 10.1109/SC.2005.41]
|
 |
30
|
|
| |
31
|
|
 |
32
|
K. E. Seamons , Y. Chen , P. Jones , J. Jozwiak , M. Winslett, Server-directed collective I/O in Panda, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224371]
|
| |
33
|
|
| |
34
|
N. Stone, D. Balog, B. Gill, B. Johan-SON, J. Marsteller, P. Nowoczynski, D. Porter, R. Reddy, J. Scott, D. Simmel, et al. PDIO: High-performance remote file I/O for Portals enabled compute nodes. Proceedings of the 2006 Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, June, 2006.
|
| |
35
|
P. M. Widener, M. Wolf, H. Abbasi, M. Barrick, J. Lofstead, J. Pullikottil, G. Eisenhauer, A. Gavrilovska, S. Klasky, R. Oldfield, P. G. Bridges, A. B. Maccabe, and K. Schwan. Structured streams: Data services for petascale science environments. Technical Report TR-CS-2007-17, University of New Mexico, Albuquerque, NM, November 2007.
|
| |
36
|
M. Wolf, H. Abbasi, B. Collins, D. Spain, and K. Schwan. Service augmentation for high end interactive data services. In IEEE International Conference on Cluster Computing (Cluster 2005), September 2005.
|
| |
37
|
Matthew Wolf , Zhongtang Cai , Weiyun Huang , Karsten Schwan, SmartPointers: personalized scientific data portals in your hand, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-16, November 16, 2002, Baltimore, Maryland
|
|