ACM Home Page
Please provide us with feedback. Feedback
Data driven workflow planning in cluster management systems
Full text PdfPdf (333 KB)
Source
High Performance Distributed Computing archive
Proceedings of the 16th international symposium on High performance distributed computing table of contents
Monterey, California, USA
SESSION: Scheduling table of contents
Pages: 127 - 136  
Year of Publication: 2007
ISBN:978-1-59593-673-8
Authors
Srinath Shankar  University of Wisconsin
David J. DeWitt  University of Wisconsin
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 118,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1272366.1272383
What is a DOI?

ABSTRACT

Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. This is especially true in the fields of astronomy and high energy physics. Furthermore, the lowered cost of disks and commodity machines has led to a dramatic increase in the amount of free disk space spread across machines in a cluster. This space is not being exploited by traditional distributed computing tools. In this paper we have evaluated ways to improve the data management capabilities of Condor, a popular distributed computing system. We have augmented the Condor system by providing the capability to store data used and produced by workflows on the disks of machines in the cluster. We have also replaced the Condor matchmaker with a new workflow planning framework that is cognizant of dependencies between jobs in a workflow and exploits these new data storage capabilities to produce workflow schedules. We show that our data caching and workflow planning framework can significantly reduce response times for data-intensive workflows by reducing data transfer over the network in a cluster. We also consider ways in which this planning framework can be made adaptive in a dynamic, multi-user, failure-prone environment.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Biomedical informatics research network. http://www.nbirn.net.
 
2
Condor fair share scheduling. http://www.cs.wisc.edu/condor/manual/v6.7/ 3 5User Priorities.html.
 
3
Grid physics network. http://www.griphyn.org.
 
4
Grid physics network in atlas. http://www.usatlas.bnl.gov/computing/grid/griphyn/.
 
5
Ncbi blast. http://www.ncbi.nlm.nih.gov/BLAST/.
 
6
Sloan Digital Sky Survey. http://www.sdss.org.
7
 
8
 
9
 
10
 
11
 
12
E. Deelman, J. Blythe, et al. Pegasus: Mapping scientific workflows onto the grid. In European Across Grids Conference, pages 11--20, 2004.
 
13
 
14
 
15
16
 
17
 
18
 
19
 
20
D. T. Liu and M. J. Franklin. The Design of GridDB: A Data-Centric Overlay for the Scientific Grid. In VLDB, pages 600--611, 2004.
 
21
G. M. Lohman et al. Query processing in R*. In Query Processing in Database Systems, pages 31--47. Springer, 1985.
 
22
M. A. Nieto-Santisteban et al. When Database Systems Meet the Grid. In CIDR, pages 154--161, 2005.
 
23
J. Quarfoth, A. Korth, and D. Lopez. Task Allocation Algorithms with Communication costs considered. Midwest Instruction and Computing Symposium, 2005.
 
24
 
25
 
26
27


Collaborative Colleagues:
Srinath Shankar: colleagues
David J. DeWitt: colleagues