|
ABSTRACT
To enable the rapid execution of many tasks on compute clusters, we have developed Falkon, a Fast and Light-weight tasK executiON framework. Falkon integrates (1) multi-level scheduling to separate resource acquisition (via, e.g., requests to batch schedulers) from task dispatch, and (2) a streamlined dispatcher. Falkon's integration of multi-level scheduling and streamlined dispatchers delivers performance not provided by any other system. We describe Falkon architecture and implementation, and present performance results for both microbenchmarks and applications. Microbenchmarks show that Falkon throughput (487 tasks/sec) and scalability (to 54,000 executors and 2,000,000 tasks processed in just 112 minutes) are one to two orders of magnitude better than other systems used in production Grids. Large-scale astronomy and medical applications executed under Falkon by the Swift parallel programming system achieve up to 90% reduction in end-to-end run time, relative to versions that execute tasks via separate scheduler submissions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Swift Workflow System: www.ci.uchicago.edu/swift, 2007.
|
| |
3
|
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, I. Raicu, T. Stef-Praun, M. Wilde. "Swift: Fast, Reliable, Loosely Coupled Parallel Computation", IEEE Workshop on Scientific Workflows 2007.
|
| |
4
|
Ian T. Foster , Jens-S. Vöckler , Michael Wilde , Yong Zhao, Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of the 14th International Conference on Scientific and Statistical Database Management, p.37-46, July 24-26, 2002
[doi> 10.1109/SSDM.2002.1029704]
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
G. Singh, C. Kesselman, E. Deelman, "Optimizing Grid-Based Workflow Execution." Journal of Grid Computing, Volume 3(3--4), December 2005, pp. 201--219.
|
| |
11
|
E. Walker, J. P. Gardner, V. Litvin, E. L. Turner, "Creating Personal Adaptive Clusters for Managing Scientific Tasks in a Distributed Computing Environment", Workshop on Challenges of Large Applications in Distributed Environments, 2006.
|
| |
12
|
G. Singh, C. Kesselman E. Deelman. "Performance Impact of Resource Provisioning on Workflows", USC ISI Technical Report 2006.
|
| |
13
|
G. Mehta, C. Kesselman, E. Deelman. "Dynamic Deployment of VO-specific Schedulers on Managed Resources," USC ISI Technical Report, 2006.
|
| |
14
|
D. Thain, T. Tannenbaum, and M. Livny, "Condor and the Grid", Grid Computing: Making The Global Infrastructure a Reality, John Wiley, 2003. ISBN: 0-470-85319-0.
|
| |
15
|
E. Robinson, D. J. DeWitt. "Turning Cluster Management into Data Management: A System Overview", Conference on Innovative Data Systems Research, 2007.
|
| |
16
|
Brett Bode , David M. Halstead , Ricky Kendall , Zhou Lei , David Jackson, The portable batch scheduler and the maui scheduler on linux clusters, Proceedings of the 4th annual Linux Showcase & Conference, p.27-27, October 10-14, 2000, Atlanta, Georgia
|
| |
17
|
S. Zhou. "LSF: Load sharing in large-scale heterogeneous distributed systems," Workshop on Cluster Computing, 1992.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
The Functional Magnetic Resonance Imaging Data Center, http://www.fmridc.org/, 2007.
|
| |
22
|
G. B. Berriman, et al., "Montage: a Grid Enabled Engine for Delivering Custom Science-Grade Image Mosaics on Demand." SPIE Conference on Astronomical Telescopes and Instrumentation. 2004.
|
| |
23
|
K. Appleby, S. Fakhouri, L. Fong, G. Goldszmidt, M. Kalantar, S. Krishnakumar, D. Pazel, J. Pershing, and B. Rochwerger, "Oceano - SLA Based Management of a Computing Utility," 7th IFIP/IEEE International Symposium on Integrated Network Management, 2001.
|
 |
24
|
Lavanya Ramakrishnan , David Irwin , Laura Grit , Aydan Yumerefendi , Adriana Iamnitchi , Jeff Chase, Toward a doctrine of containment: grid hosting with adaptive resource control, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188561]
|
| |
25
|
J. Bresnahan. "An Architecture for Dynamic Allocation of Compute Cluster Bandwidth", MS Thesis, Department of Computer Science, University of Chicago, December 2006.
|
| |
26
|
Catlett, C. et al., "TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications," HPC 2006.
|
| |
27
|
M. Feller, I. Foster, and S. Martin. "GT4 GRAM: A Functionality and Performance Study", TeraGrid Conference 2007.
|
| |
28
|
I. Foster, "Globus Toolkit Version 4: Software for Service-Oriented Systems," Conference on Network and Parallel Computing, 2005.
|
| |
29
|
The Globus Security Team. "Globus Toolkit Version 4 Grid Security Infrastructure: A Standards Perspective," Technical Report, Argonne National Laboratory, MCS, 2005.
|
 |
30
|
|
| |
31
|
I. Raicu, I. Foster, A. Szalay, G. Turcu. "AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis", TeraGrid Conference 2006.
|
| |
32
|
J. C. Jacob, et al. "The Montage Architecture for Grid-Enabled Science Processing of Large, Distributed Datasets." Earth Science Technology Conference 2004.
|
| |
33
|
Ewa Deelman , Gurmeet Singh , Mei-Hui Su , James Blythe , Yolanda Gil , Carl Kesselman , Gaurang Mehta , Karan Vahi , G. Bruce Berriman , John Good , Anastasia Laity , Joseph C. Jacob , Daniel S. Katz, Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Scientific Programming, v.13 n.3, p.219-237, July 2005
|
| |
34
|
T. Tannenbaum. "Condor RoadMap", Condor Week 2007.
|
| |
35
|
K. Ranganathan, I. Foster, "Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids", Journal of Grid Computing, V1(1) 2003.
|
CITED BY 9
|
|
|
|
|
|
|
|
Elizeu Santos-Neto , Samer Al-Kiswany , Nazareno Andrade , Sathish Gopalakrishnan , Matei Ripeanu, enabling cross-layer optimizations in storage systems with custom metadata, Proceedings of the 17th international symposium on High performance distributed computing, June 23-27, 2008, Boston, MA, USA
|
|
|
Ioan Raicu , Yong Zhao , Ian T. Foster , Alex Szalay, Accelerating large-scale data exploration through data diffusion, Proceedings of the 2008 international workshop on Data-aware distributed computing, p.9-18, June 24-24, 2008, Boston, MA, USA
|
|
|
Ioan Raicu , Zhao Zhang , Mike Wilde , Ian Foster , Pete Beckman , Kamil Iskra , Ben Clifford, Toward loosely coupled programming on petascale systems, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
|
|
|
Ioan Raicu , Ian T. Foster , Yong Zhao , Philip Little , Christopher M. Moretti , Amitabh Chaudhary , Douglas Thain, The quest for scalable support of data-intensive workloads in distributed systems, Proceedings of the 18th ACM international symposium on High performance distributed computing, June 11-13, 2009, Garching, Germany
|
|
|
Li Yi , Christopher Moretti , Scott Emrich , Kenneth Judd , Douglas Thain, Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions, Proceedings of the 18th ACM international symposium on High performance distributed computing, June 11-13, 2009, Garching, Germany
|
|
|
|
|