|
ABSTRACT
Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. This is especially true in the fields of astronomy and high energy physics. Furthermore, the lowered cost of disks and commodity machines has led to a dramatic increase in the amount of free disk space spread across machines in a cluster. This space is not being exploited by traditional distributed computing tools. In this paper we have evaluated ways to improve the data management capabilities of Condor, a popular distributed computing system. We have augmented the Condor system by providing the capability to store data used and produced by workflows on the disks of machines in the cluster. We have also replaced the Condor matchmaker with a new workflow planning framework that is cognizant of dependencies between jobs in a workflow and exploits these new data storage capabilities to produce workflow schedules. We show that our data caching and workflow planning framework can significantly reduce response times for data-intensive workflows by reducing data transfer over the network in a cluster. We also consider ways in which this planning framework can be made adaptive in a dynamic, multi-user, failure-prone environment.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Biomedical informatics research network. http://www.nbirn.net.
|
| |
2
|
Condor fair share scheduling. http://www.cs.wisc.edu/condor/manual/v6.7/ 3 5User Priorities.html.
|
| |
3
|
Grid physics network. http://www.griphyn.org.
|
| |
4
|
Grid physics network in atlas. http://www.usatlas.bnl.gov/computing/grid/griphyn/.
|
| |
5
|
Ncbi blast. http://www.ncbi.nlm.nih.gov/BLAST/.
|
| |
6
|
Sloan Digital Sky Survey. http://www.sdss.org.
|
 |
7
|
Atul Adya , William J. Bolosky , Miguel Castro , Gerald Cermak , Ronnie Chaiken , John R. Douceur , Jon Howell , Jacob R. Lorch , Marvin Theimer , Roger P. Wattenhofer, Farsite: federated, available, and reliable storage for an incompletely trusted environment, ACM SIGOPS Operating Systems Review, v.36 n.SI, Winter 2002
[doi> 10.1145/844128.844130]
|
| |
8
|
John Bent , Douglas Thain , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , Miron Livny, Explicit control a batch-aware distributed file system, Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, p.27-27, March 29-31, 2004, San Francisco, California
|
| |
9
|
J. Blythe , S. Jain , E. Deelman , Y. Gil , K. Vahi , A. Mandal , K. Kennedy, Task scheduling strategies for workflow-based applications in grids, Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2, p.759-767, May 09-12, 2005
|
| |
10
|
|
| |
11
|
Ann Chervenak , Ewa Deelman , Ian Foster , Leanne Guy , Wolfgang Hoschek , Adriana Iamnitchi , Carl Kesselman , Peter Kunszt , Matei Ripeanu , Bob Schwartzkopf , Heinz Stockinger , Kurt Stockinger , Brian Tierney, Giggle: a framework for constructing scalable replica location services, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-17, November 16, 2002, Baltimore, Maryland
|
| |
12
|
E. Deelman, J. Blythe, et al. Pegasus: Mapping scientific workflows onto the grid. In European Across Grids Conference, pages 11--20, 2004.
|
| |
13
|
D. J. Dewitt , S. Ghandeharizadeh , D. A. Schneider , A. Bricker , H. -I. Hsiao , R. Rasmussen, The Gamma Database Machine Project, IEEE Transactions on Knowledge and Data Engineering, v.2 n.1, p.44-62, March 1990
[doi> 10.1109/69.50905]
|
| |
14
|
Ian T. Foster , Jens-S. Vöckler , Michael Wilde , Yong Zhao, Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of the 14th International Conference on Scientific and Statistical Database Management, p.37-46, July 24-26, 2002
[doi> 10.1109/SSDM.2002.1029704]
|
| |
15
|
|
 |
16
|
Yannis E. Ioannidis , Miron Livny , Anastassia Ailamaki , Anand Narayanan , Andrew Therber, Zoo: a desktop experiment management environment, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.580-583, May 11-15, 1997, Tucson, Arizona, United States
|
| |
17
|
|
| |
18
|
|
| |
19
|
D. Lee , J. Choi , J. H. Kim , S. H. Noh , S. L. Min , Y. Cho , C. S. Kim, LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies, IEEE Transactions on Computers, v.50 n.12, p.1352-1361, December 2001
[doi> 10.1109/TC.2001.970573]
|
| |
20
|
D. T. Liu and M. J. Franklin. The Design of GridDB: A Data-Centric Overlay for the Scientific Grid. In VLDB, pages 600--611, 2004.
|
| |
21
|
G. M. Lohman et al. Query processing in R*. In Query Processing in Database Systems, pages 31--47. Springer, 1985.
|
| |
22
|
M. A. Nieto-Santisteban et al. When Database Systems Meet the Grid. In CIDR, pages 154--161, 2005.
|
| |
23
|
J. Quarfoth, A. Korth, and D. Lopez. Task Allocation Algorithms with Communication costs considered. Midwest Instruction and Computing Symposium, 2005.
|
| |
24
|
|
| |
25
|
Alexandru Romosan , Doron Rotem , Arie Shoshani , Derek Wright, Co-scheduling of computation and data on computer clusters, Proceedings of the 17th international conference on Scientific and statistical database management, p.103-112, June 27-29, 2005, Santa Barbara, CA
|
| |
26
|
|
 |
27
|
|
CITED BY 5
|
|
Brandon Szeliga , John Cavicchio , Weisong Shi, DIMM: a distributed metadata management for data-intensive HPC environments, Proceedings of the 2008 international workshop on Data-aware distributed computing, p.19-28, June 24-24, 2008, Boston, MA, USA
|
|
|
David J. DeWitt , Erik Paulson , Eric Robinson , Jeffrey Naughton , Joshua Royalty , Srinath Shankar , Andrew Krioukov, Clustera: an integrated computation and data management system, Proceedings of the VLDB Endowment, v.1 n.1, August 2008
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.2
COMPUTER-COMMUNICATION NETWORKS
C.2.4
Distributed Systems
Subjects:
Distributed applications
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.4
Systems and Software
Subjects:
Distributed systems
General Terms:
Algorithms,
Design,
Experimentation,
Management,
Performance
Keywords:
cluster management,
condor,
data management,
planning,
scheduling,
scientific computing,
workflow management
|