|
ABSTRACT
We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying this approach to enable the use of petascale systems by a broader user community, and with greater ease. Our work enables the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications. This approach allows a new---and potentially far larger---class of applications to leverage petascale systems, such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O performance encountered in making this model practical, and show results using both microbenchmarks and real applications from two domains: economic energy modeling and molecular dynamics. Our benchmarks show that we can scale up to 160K processor-cores with high efficiency, and can achieve sustained execution rates of thousands of tasks per second.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
IBM BlueGene/P (BG/P), http://www.research.ibm.com/bluegene/, 2008
|
| |
2
|
|
| |
3
|
Y. Zhao, I. Raicu, I. Foster. "Scientific Workflow Systems for 21st Century e-Science, New Bottle or New Wine?" IEEE Workshop on Scientific Workflows 2008
|
| |
4
|
|
| |
5
|
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, I. Raicu, T. Stef-Praun, M. Wilde. "Swift: Fast, Reliable, Loosely Coupled Parallel Computation" IEEE Workshop on Scientific Workflows 2007
|
 |
6
|
Ioan Raicu , Yong Zhao , Catalin Dumitrescu , Ian Foster , Mike Wilde, Falkon: a Fast and Light-weight tasK executiON framework, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362680]
|
| |
7
|
Ewa Deelman , Gurmeet Singh , Mei-Hui Su , James Blythe , Yolanda Gil , Carl Kesselman , Gaurang Mehta , Karan Vahi , G. Bruce Berriman , John Good , Anastasia Laity , Joseph C. Jacob , Daniel S. Katz, Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Scientific Programming, v.13 n.3, p.219-237, July 2005
|
 |
8
|
Ioan Raicu , Yong Zhao , Ian T. Foster , Alex Szalay, Accelerating large-scale data exploration through data diffusion, Proceedings of the 2008 international workshop on Data-aware distributed computing, p.9-18, June 24-24, 2008, Boston, MA, USA
[doi> 10.1145/1383519.1383521]
|
 |
9
|
Michael Isard , Mihai Budiu , Yuan Yu , Andrew Birrell , Dennis Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, March 21-23, 2007, Lisbon, Portugal
|
| |
10
|
|
| |
11
|
M. Livny, J. Basney, R. Raman, T. Tannenbaum. "Mechanisms for High Throughput Computing," SPEEDUP Journal 1(1), 1997
|
| |
12
|
M. Flynn. "Some Computer Organizations and Their Effectiveness", IEEE Trans. Comput. C-21, 1972, pp. 948
|
| |
13
|
|
| |
14
|
"Swift Workflow System": www.ci.uchicago.edu/swift, 2008
|
| |
15
|
Top500, June 2008, http://www.top500.org/lists/2008/06
|
 |
16
|
|
| |
17
|
|
| |
18
|
J. Cope, M. Oberg, H. M. Tufo, T. Voran, M. Woitaszek. "High Throughput Grid Computing with an IBM Blue Gene/L," Cluster 2007
|
| |
19
|
A. Peters, A. King, T. Budnik, P. McCarthy, P. Michaud, M. Mundy, J. Sexton, G. Stewart. "Asynchronous Task Dispatch for High Throughput Computing for the eServer IBM Blue Gene® Supercomputer," Parallel and Distributed Processing (IPDPS), 2008
|
| |
20
|
A. Gara, et al. "Overview of the Blue Gene/L system architecture", IBM Journal of Research and Development 49(2/3), 2005
|
| |
21
|
IBM Coorporation. "High-Throughput Computing (HTC) Paradigm," IBM System Blue Gene Solution: Blue Gene/P Application Development, IBM RedBooks, 2008
|
| |
22
|
A. Bialecki, M. Cafarella, D. Cutting, O. O'Malley. "Hadoop: A Framework for Running Applications on Large Clusters Built of Commodity Hardware," http://lucene.apache.org/hadoop/, 2005
|
| |
23
|
|
| |
24
|
F. J. L. Reid, "Task Farming on Blue Gene," EEPC, Edinburgh University, 2006
|
| |
25
|
N. Desai. "Cobalt: An Open Source Platform for HPC System Software Research," Edinburgh BG/L System Software Workshop, 2005
|
| |
26
|
J. E. Moreira et al., "Blue Gene/L Programming and Operating Environment," IBM Journal of Research and Development 49(2/3), 2005
|
| |
27
|
"ZeptoOS: The Small Linux for Big Computers," http://www-unix.mcs.anl.gov/zeptoos/, 2008
|
| |
28
|
Brett Bode , David M. Halstead , Ricky Kendall , Zhou Lei , David Jackson, The portable batch scheduler and the maui scheduler on linux clusters, Proceedings of the 4th annual Linux Showcase & Conference, p.27-27, October 10-14, 2000, Atlanta, Georgia
|
| |
29
|
E. Robinson, D. J. DeWitt. "Turning Cluster Management into Data Management: A System Overview," Conference on Innovative Data Systems Research, 2007
|
| |
30
|
|
| |
31
|
G. v. Laszewski, M. Hategan, D. Kodeboyina. "Java CoG Kit Workflow," in I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields, eds., Workflows for eScience, 2007, pp. 340--356
|
| |
32
|
I. Raicu, Y. Zhao, I. Foster, A. Szalay. "A Data Diffusion Approach to Large-scale Scientific Exploration," Microsoft eScience Workshop at RENCI2007
|
| |
33
|
A. Szalay, A. Bunn, J. Gray, I. Foster, I. Raicu. "The Importance of Data Locality in Distributed Computing Applications," NSF Workflow Workshop 2006
|
| |
34
|
Y. Zhao, I. Raicu, I. Foster, M. Hategan, V. Nefedova, M. Wilde. "Realizing Fast, Scalable and Reliable Scientific Computations in Grid Environments", Grid Computing Research Progress, Nova Pub. 2008
|
| |
35
|
Open Science Grid (OSG), http://www.opensciencegrid.org/, 2008
|
| |
36
|
C. Catlett et al., "TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications," HPC and Grids in Action, ed. Lucio Grandinetti, IOS Press Advances in Parallel Computing series, Amsterdam, 2007
|
| |
37
|
SiCortex, http://www.sicortex.com/, 2008
|
| |
38
|
J. C. Jacob et al. "The Montage Architecture for Grid-Enabled Science Processing of Large, Distributed Datasets," Earth Science Technology Conference 2004
|
| |
39
|
The Functional Magnetic Resonance Imaging Data Center, http://www.fmridc.org/, 2007
|
| |
40
|
T. Stef-Praun, B. Clifford, I. Foster, U. Hasson, M. Hategan, S. Small, M. Wilde, Y. Zhao. "Accelerating Medical Research using the Swift Workflow System," Health Grid, 2007
|
| |
41
|
D. T. Moustakas et al. "Development and Validation of a Modular, Extensible Docking Program: DOCK 5," J. Comput. Aided Mol. Des. 20, 2006, pp. 601--619
|
| |
42
|
D. Hanson. "Enhancing Technology Representations within the Stanford Energy Modeling Forum (EMF) Climate Economic Models," Energy and Economic Policy Models: A Reexamination of Fundamentals, 2006
|
| |
43
|
T. Stef-Praun, G. Madeira, I. Foster, R. Townsend. "Accelerating Solution of a Moral Hazard Problem with Swift," e-Social Science, 2007
|
| |
44
|
I. Foster, "Globus Toolkit Version 4: Software for Service-Oriented Systems," Conference on Network and Parallel Computing, 2005
|
| |
45
|
R. Stevens. "The LLNL/ANL/IBM Collaboration to Develop BG/P and BG/Q," DOE ASCAC Report, 2006
|
| |
46
|
KEGG's Ligand Database: http://www.genome.ad.jp/kegg/ligand.html, 2008
|
CITED BY
|
|
Ioan Raicu , Ian T. Foster , Yong Zhao , Philip Little , Christopher M. Moretti , Amitabh Chaudhary , Douglas Thain, The quest for scalable support of data-intensive workloads in distributed systems, Proceedings of the 18th ACM international symposium on High performance distributed computing, June 11-13, 2009, Garching, Germany
|
|