ACM Home Page
Please provide us with feedback. Feedback
Ridge: combining reliability and performance in open grid platforms
Full text PdfPdf (283 KB)
Source
High Performance Distributed Computing archive
Proceedings of the 16th international symposium on High performance distributed computing table of contents
Monterey, California, USA
SESSION: Reliability and fault tolerance table of contents
Pages: 55 - 64  
Year of Publication: 2007
ISBN:978-1-59593-673-8
Authors
Krishnaveni Budati  University of Minnesota
Jason Sonnek  University of Minnesota
Abhishek Chandra  University of Minnesota
Jon Weissman  University of Minnesota
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 54,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1272366.1272374
What is a DOI?

ABSTRACT

Large-scale donation-based distributed infrastructures need to cope with the inherent unreliability of participant nodes. A widely-used work scheduling technique in such environments is to redundantly schedule the out sourced computations to a number of nodes. We present the design and implementation of RIDGE, a reliability aware system which uses a node's prior performance and behavior to make more effective scheduling decisions. We have implemented RIDGE on top of the BOINC distributed computing infrastructure and have evaluated its performance on a live test bed consisting of 120 PlanetLab nodes. Our experimental results show that RIDGE is able to match or surpass the throughput of the best vanilla BOINC configuration under different reliability environments, by automatically adapting to the characteristics of the underlying environment. In addition, RIDGE is able to provide much lower work unit makes pans compared to BOINC, which indicates its desirability in service-oriented environments with time constraints.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
BLAST. http://www.ncbi.nlm.nih.gov/blast.
 
5
 
6
 
7
Climate Prediction Network. http://www.climateprediction.net/.
 
8
 
9
Foldingυhome distributing computing project. http://folding.stanford.edu.
 
10
 
11
 
12
 
13
D. Nurmi, J. Brevik, and R. Wolski. Minimizing the Network Overhead of Checkpointing in Cycle-harvesting Cluster Environment. In Proceedings of Cluster 2005, September 2005.
 
14
D. Nurmi, J. Brevik, and R. Wolski. Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments. In Proceedings of EUROPAR 2005, August 2005.
 
15
PPDG: Particle Physics Data Grid. http://www.ppdg.net.
 
16
X. Ren, S. Lee, R. Eigenmann, and S. Bagchi. Resource Availability Prediction in Fine-Grained Cycle Sharing Systems. In Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, 2006.
 
17
M. R. Viswanath and K. Schwan. Harnessing Non-dedicated Wide-area Clusters for On-demand Computing. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster 2005), September 2005.
 
18
 
19
J. Sonnek, M. Nathan, A. Chandra, and J.Weissman. Reputation-Based Scheduling on Unreliable Distributed Infrastructures. Technical Report 05-036, Dept. of CSE, Univ. of Minnesota, November 2005.
 
20
 
21
 
22


Collaborative Colleagues:
Krishnaveni Budati: colleagues
Jason Sonnek: colleagues
Abhishek Chandra: colleagues
Jon Weissman: colleagues