|
ABSTRACT
Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both micro-benchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
W. Xiaohui, et al. "Implementing data aware scheduling in Gfarm using LSF scheduler plugin mechanism", 2005 International Conference on Grid Computing and Applications, pp.3--10, 2005
|
| |
2
|
P. Fuhrmann. "dCache, the commodity cache," IEEE Mass Storage Systems and Technologies 2004
|
 |
3
|
|
 |
4
|
Ioan Raicu , Yong Zhao , Catalin Dumitrescu , Ian Foster , Mike Wilde, Falkon: a Fast and Light-weight tasK executiON framework, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362680]
|
 |
5
|
|
| |
6
|
I. Raicu, I. Foster, A. Szalay, G. Turcu. "AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis", TeraGrid Conference 2006
|
| |
7
|
A. Szalay, J. Bunn, J. Gray, I. Foster, I. Raicu. "The Importance of Data Locality in Distributed Computing Applications", NSF Workflow Workshop 2006
|
| |
8
|
|
| |
9
|
SDSS: Sloan Digital Sky Survey, http://www.sdss.org/, 2007
|
| |
10
|
K. Ranganathan, I. Foster, "Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids", Journal of Grid Computing, V1(1) 2003
|
| |
11
|
|
| |
12
|
I. Raicu, C. Dumitrescu, I. Foster. "Dynamic Resource Provisioning in Grid Environments", TeraGrid Conf. 2007
|
| |
13
|
|
| |
14
|
|
| |
15
|
G.B. Berriman, et al. "Montage: a Grid Enabled Engine for Delivering Custom Science-Grade Image Mosaics on Demand." SPIE Conference on Astronomical Telescopes and Instrumentation, 2004
|
| |
16
|
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, I. Raicu, T. Stef-Praun, M. Wilde. "Swift: Fast, Reliable, Loosely Coupled Parallel Computation", IEEE Workshop on Scientific Workflows 2007
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
C. Catlett, et al. "TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications," HPC 2006
|
| |
21
|
Fay Chang , Jeffrey Dean , Sanjay Ghemawat , Wilson C. Hsieh , Deborah A. Wallach , Mike Burrows , Tushar Chandra , Andrew Fikes , Robert E. Gruber, Bigtable: a distributed storage system for structured data, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
| |
22
|
I. Raicu, I. Foster. "Characterizing Storage Resources Performance in Accessing the SDSS Dataset," Tech. Report, Univ of Chicago, 2006
|
| |
23
|
X. Wei, W.W. Li, O. Tatebe, G. Xu, L. Hu, and J. Ju. "Integrating Local Job Scheduler -- LSF with Gfarm", Parallel and Distributed Processing and Applications, Springer Berlin, Vol. 3758/2005, pp 196--204, 2005
|
| |
24
|
ANL/UC TeraGrid Site Details, http://www.uc.teragrid.org/tg-docs/tg-tech-sum.html, 2007
|
| |
25
|
CAS SkyServer, http://cas.sdss.org/dr6/en/tools/search/sql.asp, 2007
|
| |
26
|
I. Raicu, Y. Zhao, I. Foster, A. Szalay. "A Data Diffusion Approach to Large Scale Scientific Exploration," Microsoft eScience Workshop at RENCI 2007
|
| |
27
|
A. Bialecki, M. Cafarella, D. Cutting, O. O'Malley. "Hadoop: a framework for running applications on large clusters built of commodity hardware", http://lucene.apache.org/hadoop/, 2005
|
| |
28
|
T. Kosar. "A New Paradigm in Data Intensive Computing: Stork and the Data-Aware Schedulers", IEEE CLADE 2006
|
| |
29
|
John Bent , Douglas Thain , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , Miron Livny, Explicit control a batch-aware distributed file system, Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, p.27-27, March 29-31, 2004, San Francisco, California
|
| |
30
|
I. Raicu. "Harnessing Grid Resources with Data-Centric Task Farms", Technical Report, University of Chicago, 2007
|
 |
31
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
32
|
I. Raicu, I. Foster. "A Comparison of Data Diffusion to the GPFS Shared File System", Technical Report, University of Chicago, 2007
|
| |
33
|
Y. Zhao, I. Raicu, I. Foster, M. Hategan, V. Nefedova, M. Wilde. "Realizing Fast, Scalable and Reliable Scientific Computations in Grid Environments", Grid Computing Research Progress, Nova Pub. 2008
|
| |
34
|
J. Gray. "Distributed Computing Economics", Technical Report MSR-TR-2003-24, Microsoft Research, Microsoft Corporation, 2003
|
| |
35
|
|
| |
36
|
|
CITED BY 2
|
|
Ioan Raicu , Zhao Zhang , Mike Wilde , Ian Foster , Pete Beckman , Kamil Iskra , Ben Clifford, Toward loosely coupled programming on petascale systems, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
Ioan Raicu , Ian T. Foster , Yong Zhao , Philip Little , Christopher M. Moretti , Amitabh Chaudhary , Douglas Thain, The quest for scalable support of data-intensive workloads in distributed systems, Proceedings of the 18th ACM international symposium on High performance distributed computing, June 11-13, 2009, Garching, Germany
|
INDEX TERMS
Primary Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.2
Storage Management
Subjects:
Storage hierarchies
General Terms:
Design,
Management,
Measurement,
Performance
Keywords:
Falkon,
Swift,
data caching,
data diffusion,
data management,
data-aware scheduling,
data-intensive applications,
grid
|