ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
GridBot: execution of bags of tasks in multiple grids
Full text PdfPdf (457 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis table of contents
Portland, Oregon
SESSION: Technical papers table of contents
Article No.: 11  
Year of Publication: 2009
ISBN:978-1-60558-744-8
Authors
Mark Silberstein  Israel Institute of Technology
Artyom Sharov  Israel Institute of Technology
Dan Geiger  Israel Institute of Technology
Assaf Schuster  Israel Institute of Technology
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
: IEEE CS
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 60,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1654059.1654071
What is a DOI?

ABSTRACT

We present a holistic approach for efficient execution of bags-of-tasks (BOTs) on multiple grids, clusters, and volunteer computing grids virtualized as a single computing platform. The challenge is twofold: to assemble this compound environment and to employ it for execution of a mixture of throughput- and performance-oriented BOTs, with a dozen to millions of tasks each. Our generic mechanism allows per BOT specification of dynamic arbitrary scheduling and replication policies as a function of the system state, BOT execution state, and BOT priority.

We implement our mechanism in the GridBot system and demonstrate its capabilities in a production setup. GridBot has executed hundreds of BOTs with over 9 million jobs during three months alone; these have been invoked on 25,000 hosts, 15,000 from the Superlink@Technion community grid and the rest from the Technion campus grid, local clusters, the Open Science Grid, EGEE, and the UW Madison pool.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Community grids managed by BOINC. http://http://boincstats.com.
 
2
Condor DAGman. http://www.cs.wisc.edu/condor/dagman.
 
3
Condor Glidein. http://www.cs.wisc.edu/condor/glidein.
 
4
EDGeS project. http://www.edges-grid.eu/.
 
5
The enabling grids for e-science. http://www.eu-egee.org.
 
6
GridBot monitoring. http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion/stripcharts.php.
 
7
The open science grid. http://www.opensciencegrid.org.
 
8
Superlink-online genetic linkage analysis portal. http://bioinfo.cs.technion.ac.il/superlink-online.
 
9
Superlink@Technion community grid. http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion.
 
10
J. H. Abawajy. Fault-tolerant scheduling policy for grid computing systems. In IPDPS, pages 238+, 2004.
 
11
 
12
 
13
 
14
 
15
C. Anglano and M. Canonico. Scheduling algorithms for multiple bag-of-task applications on desktop grids: A knowledge-free approach. In IPDPS, pages 1--8, 2008.
 
16
 
17
H. Casanova and F. Berman. Parameter sweeps on the grid with APST. In F. Berman, G. Fox, and T. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality, chapter 26. 2003.
 
18
W. Cirne, D. Paranhos, L. Costa, E. Santos-Neto, F. Brasileiro, J. Sauve, F. A. B. Silva, C. O. Barros, C. Silveira, and C. Silveira. In ICPP, pages 407--416, 2003.
 
19
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, K. Blackburn, A. Lazzarini, A. Arbree, R. Cavanaugh, and S. Koranda. Mapping abstract complex workflows onto grid environments. Journal of Grid Computing, V1(1):25--39, March 2003.
 
20
21
 
22
G. Juve and E. Deelman. Resource provisioning options for large-scale scientific workflows. pages 608--613, Dec. 2008.
 
23
 
24
D. Kondo, F. Araujo, P. Malecot, P. Domingues, L. M. Silva, G. Fedak, and F. Cappello. Characterizing result errors in Internet desktop grids. In Euro-Par, pages 361--371, 2007.
 
25
 
26
 
27
D. Lingrand, J. Montagnat, and T. Glatard. Estimating the execution context for refining submission strategies on production grids. Technical Report I3S/RR-2007-22-FR, I3S laboratory, Sophia Antipolis, Nov. 2007.
 
28
29
 
30
 
31
M. Silberstein, D. Geiger, A. Schuster, and M. Livny. Scheduling mixed workloads in multi-grids: The grid execution hierarchy. In HPDC, pages 291--302, 2006.
32
 
33
M. Silberstein, A. Tzemach, N. Dovgolevskiy, M. Fishelson, A. Schuster, and D. Geiger. On-line system for faster linkage analysis via parallel execution on thousands of personal computers. American Journal of Human Genetics, 78(6):922--935, 2006.
34
 
35
Y. suk Kee, C. Kesselman, D. Nurmi, and R. Wolski. Enabling personal clusters on demand for batch resources using commodity software. In IPDPS, pages 1--7, 2008.
 
36
 
37
 
38
M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. pages 29--42, San Diego, CA, 12/2008 2008. USENIX Association.
 
39
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde. Swift: Fast, reliable, loosely coupled parallel computation. In Services 2007, pages 199--206, 2007.

Collaborative Colleagues:
Mark Silberstein: colleagues
Artyom Sharov: colleagues
Dan Geiger: colleagues
Assaf Schuster: colleagues