|
ABSTRACT
We present a holistic approach for efficient execution of bags-of-tasks (BOTs) on multiple grids, clusters, and volunteer computing grids virtualized as a single computing platform. The challenge is twofold: to assemble this compound environment and to employ it for execution of a mixture of throughput- and performance-oriented BOTs, with a dozen to millions of tasks each. Our generic mechanism allows per BOT specification of dynamic arbitrary scheduling and replication policies as a function of the system state, BOT execution state, and BOT priority. We implement our mechanism in the GridBot system and demonstrate its capabilities in a production setup. GridBot has executed hundreds of BOTs with over 9 million jobs during three months alone; these have been invoked on 25,000 hosts, 15,000 from the Superlink@Technion community grid and the rest from the Technion campus grid, local clusters, the Open Science Grid, EGEE, and the UW Madison pool.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Community grids managed by BOINC. http://http://boincstats.com.
|
| |
2
|
Condor DAGman. http://www.cs.wisc.edu/condor/dagman.
|
| |
3
|
Condor Glidein. http://www.cs.wisc.edu/condor/glidein.
|
| |
4
|
EDGeS project. http://www.edges-grid.eu/.
|
| |
5
|
The enabling grids for e-science. http://www.eu-egee.org.
|
| |
6
|
GridBot monitoring. http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion/stripcharts.php.
|
| |
7
|
The open science grid. http://www.opensciencegrid.org.
|
| |
8
|
Superlink-online genetic linkage analysis portal. http://bioinfo.cs.technion.ac.il/superlink-online.
|
| |
9
|
Superlink@Technion community grid. http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion.
|
| |
10
|
J. H. Abawajy. Fault-tolerant scheduling policy for grid computing systems. In IPDPS, pages 238+, 2004.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
C. Anglano and M. Canonico. Scheduling algorithms for multiple bag-of-task applications on desktop grids: A knowledge-free approach. In IPDPS, pages 1--8, 2008.
|
| |
16
|
Franck Cappello , Samir Djilali , Gilles Fedak , Thomas Herault , Frédéric Magniette , Vincent Néri , Oleg Lodygensky, Computing on large-scale distributed systems: Xtrem Web architecture, programming models, security, tests and convergence with grid, Future Generation Computer Systems, v.21 n.3, p.417-437, 1 March 2005
[doi> 10.1016/j.future.2004.04.011]
|
| |
17
|
H. Casanova and F. Berman. Parameter sweeps on the grid with APST. In F. Berman, G. Fox, and T. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality, chapter 26. 2003.
|
| |
18
|
W. Cirne, D. Paranhos, L. Costa, E. Santos-Neto, F. Brasileiro, J. Sauve, F. A. B. Silva, C. O. Barros, C. Silveira, and C. Silveira. In ICPP, pages 407--416, 2003.
|
| |
19
|
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, K. Blackburn, A. Lazzarini, A. Arbree, R. Cavanaugh, and S. Koranda. Mapping abstract complex workflows onto grid environments. Journal of Grid Computing, V1(1):25--39, March 2003.
|
| |
20
|
|
 |
21
|
Alexandru Iosup , Ozan Sonmez , Shanny Anoep , Dick Epema, The performance of bags-of-tasks in large-scale distributed systems, Proceedings of the 17th international symposium on High performance distributed computing, June 23-27, 2008, Boston, MA, USA
[doi> 10.1145/1383422.1383435]
|
| |
22
|
G. Juve and E. Deelman. Resource provisioning options for large-scale scientific workflows. pages 608--613, Dec. 2008.
|
| |
23
|
|
| |
24
|
D. Kondo, F. Araujo, P. Malecot, P. Domingues, L. M. Silva, G. Fedak, and F. Cappello. Characterizing result errors in Internet desktop grids. In Euro-Par, pages 361--371, 2007.
|
| |
25
|
|
| |
26
|
|
| |
27
|
D. Lingrand, J. Montagnat, and T. Glatard. Estimating the execution context for refining submission strategies on production grids. Technical Report I3S/RR-2007-22-FR, I3S laboratory, Sophia Antipolis, Nov. 2007.
|
| |
28
|
|
 |
29
|
Ioan Raicu , Yong Zhao , Catalin Dumitrescu , Ian Foster , Mike Wilde, Falkon: a Fast and Light-weight tasK executiON framework, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362680]
|
| |
30
|
|
| |
31
|
M. Silberstein, D. Geiger, A. Schuster, and M. Livny. Scheduling mixed workloads in multi-grids: The grid execution hierarchy. In HPDC, pages 291--302, 2006.
|
 |
32
|
Mark Silberstein , Assaf Schuster , Dan Geiger , Anjul Patney , John D. Owens, Efficient computation of sum-products on GPUs through software-managed cache, Proceedings of the 22nd annual international conference on Supercomputing, June 07-12, 2008, Island of Kos, Greece
[doi> 10.1145/1375527.1375572]
|
| |
33
|
M. Silberstein, A. Tzemach, N. Dovgolevskiy, M. Fishelson, A. Schuster, and D. Geiger. On-line system for faster linkage analysis via parallel execution on thousands of personal computers. American Journal of Human Genetics, 78(6):922--935, 2006.
|
 |
34
|
Gurmeet Singh , Mei-Hui Su , Karan Vahi , Ewa Deelman , Bruce Berriman , John Good , Daniel S. Katz , Gaurang Mehta, Workflow task clustering for best effort systems with Pegasus, Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities, January 29-February 03, 2008, Baton Rouge, Louisiana
[doi> 10.1145/1341811.1341822]
|
| |
35
|
Y. suk Kee, C. Kesselman, D. Nurmi, and R. Wolski. Enabling personal clusters on demand for batch resources using commodity software. In IPDPS, pages 1--7, 2008.
|
| |
36
|
|
| |
37
|
|
| |
38
|
M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. pages 29--42, San Diego, CA, 12/2008 2008. USENIX Association.
|
| |
39
|
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde. Swift: Fast, reliable, loosely coupled parallel computation. In Services 2007, pages 199--206, 2007.
|
|