|
ABSTRACT
Large-scale donation-based distributed infrastructures need to cope with the inherent unreliability of participant nodes. A widely-used work scheduling technique in such environments is to redundantly schedule the out sourced computations to a number of nodes. We present the design and implementation of RIDGE, a reliability aware system which uses a node's prior performance and behavior to make more effective scheduling decisions. We have implemented RIDGE on top of the BOINC distributed computing infrastructure and have evaluated its performance on a live test bed consisting of 120 PlanetLab nodes. Our experimental results show that RIDGE is able to match or surpass the throughput of the best vanilla BOINC configuration under different reliability environments, by automatically adapting to the characteristics of the underlying environment. In addition, RIDGE is able to provide much lower work unit makes pans compared to BOINC, which indicates its desirability in service-oriented environments with time constraints.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
Andy Bavier , Mic Bowman , Brent Chun , David Culler , Scott Karlin , Steve Muir , Larry Peterson , Timothy Roscoe , Tammo Spalink , Mike Wawrzoniak, Operating system support for planetary-scale network services, Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, p.19-19, March 29-31, 2004, San Francisco, California
|
| |
4
|
BLAST. http://www.ncbi.nlm.nih.gov/blast.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Climate Prediction Network. http://www.climateprediction.net/.
|
| |
8
|
|
| |
9
|
Foldingυhome distributing computing project. http://folding.stanford.edu.
|
| |
10
|
|
| |
11
|
Frank Wang , Na Helian , Sining Wu , Yuhui Deng , Ke Zhou , Yike Guo , Steve Thompson , Ian Johnson , Dave Milward , Robert Maddock , Benjamin Khoo, Cluster Computing and Grid 2005 Works in Progress, IEEE Distributed Systems Online, v.6 n.9, p.2, September 2005
[doi> 10.1109/MDSO.2005.46]
|
| |
12
|
|
| |
13
|
D. Nurmi, J. Brevik, and R. Wolski. Minimizing the Network Overhead of Checkpointing in Cycle-harvesting Cluster Environment. In Proceedings of Cluster 2005, September 2005.
|
| |
14
|
D. Nurmi, J. Brevik, and R. Wolski. Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments. In Proceedings of EUROPAR 2005, August 2005.
|
| |
15
|
PPDG: Particle Physics Data Grid. http://www.ppdg.net.
|
| |
16
|
X. Ren, S. Lee, R. Eigenmann, and S. Bagchi. Resource Availability Prediction in Fine-Grained Cycle Sharing Systems. In Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, 2006.
|
| |
17
|
M. R. Viswanath and K. Schwan. Harnessing Non-dedicated Wide-area Clusters for On-demand Computing. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster 2005), September 2005.
|
| |
18
|
|
| |
19
|
J. Sonnek, M. Nathan, A. Chandra, and J.Weissman. Reputation-Based Scheduling on Unreliable Distributed Infrastructures. Technical Report 05-036, Dept. of CSE, Univ. of Minnesota, November 2005.
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
|