|
ABSTRACT
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amountof idle computational resources available on the Internet. Such systems allow guest jobs to run on a host if they do not significantly impact the local users of the host. Since the hosts are typically provided voluntarily, their availability fluctuates greatly. To provide fault tolerance to guest jobs without adding significant computational overhead, we propose failure-aware checkpointing techniques that apply the knowledge of resource availability to select checkpoint repositories and to determine checkpoint intervals. We present the schemes of selecting reliable and efficient repositories from the non-dedicated hosts that contribute their disk storage. These schemes are formulated as 0/1 programming problems to optimize the network overhead of transferring checkpoints and the work lost due to unavailability of a storage host when needed to recover a guest job. We determine the checkpoint interval by comparing the cost of checkpointing immediately and the cost of delaying that to a later time, which is a function of the resource availability. We evaluate these techniques on an FGCS system called iShare, using trace-based simulation. The results show that they achieve better application performance than the prevalent methods which use checkpointing with a fixed periodicity on dedicated checkpoint servers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
R. Buyya and M. Murshed. GridSim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurrency and Computation: Practice and Experience, 14:1175--1220, 2002.
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
D. Nurmi, J. Brevik, and R. Wolski. Minimizing the network overhead of checkpointing in cycle-harvesting cluster environments. In Proc. of Cluster'05, 2006.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
X. Ren and R. Eigenmann. iShare - Open internet sharing built on P2P and web. In Proc. of EGC'05, pages 1117--1127, 2005.
|
| |
16
|
|
| |
17
|
X. Ren, R. Eigenmann, and S. Bagchi. Availability prediction for non-dedicated storages in fine-grained cycle sharing systems. Technical Report ECE-HPCLab-06201, Purdue University, 2006.
|
| |
18
|
X. Ren, S. Lee, R. Eigenmann, and S. Bagchi. Resource availability prediction in fine-grained cycle sharing systems. In Proc. of HPDC'06, pages 93--104, 2006.
|
| |
19
|
X. Ren, S. Lee, R. Eigenmann, and S. Bagchi. Prediction of resource availability in fine-grained cycle sharing systems and empirical evaluation. To appear in the Journal of Grid Computing, 2007.
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Y. Y. Zhang, M. Squillante, A. Sivasubramaniam, and R. K. Sahoo. Performance implications of failures in large-scale cluster scheduling. In 10th Workshop on Job Scheduling Strategies for Parallel Processing, 2004.
|
| |
26
|
D. Zhou and V. Lo. Wave scheduler: Scheduling for faster turnaround time in peer-based desktop grid systems. mIn Proc. of the 11th Workshop on Job Scheduling Strategies for Parallel Processing, 2005.
|
CITED BY 3
|
|
|
|
|
Ardalan Kangarlou , Dongyan Xu , Paul Ruth , Patrick Eugster, Taking snapshots of virtual networked environments, Proceedings of the 3rd international workshop on Virtualization technology in distributed computing, p.1-8, November 12-12, 2007, Reno, Nevada
|
|
|
|
|