ACM Home Page
Please provide us with feedback. Feedback
Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems
Full text PdfPdf (321 KB)
Source ACM International Conference Proceeding Series; Vol. 117 archive
Proceedings of the 3rd international workshop on Middleware for grid computing table of contents
Grenoble, France
Pages: 1 - 6  
Year of Publication: 2005
ISBN:1-59593-269-0
Authors
Raphael Y. de Camargo  University of São Paulo, Brazil
Renato Cerqueira  PUC-Rio, Brazil
Fabio Kon  University of São Paulo, Brazil
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 29,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1101499.1101500
What is a DOI?

ABSTRACT

Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
de Camargo, R. Y., Goldchleger, A., Carneiro, M., and Kon, F. The Grid architectural pattern: Leveraging distributed processing capabilities. In Pattern Languages of Program Design 5 (2005), Addison-Wesley Publishing Company. Accepted.
 
5
6
7
 
8
 
9
 
10
 
11
Litzkow, M., Livny, M., and Mutka, M. Condor - A hunter of idle workstations. In ICDCS '88: Proceedings of the 8th Int. Conference of Distributed Computing Systems (June 1988), pp. 104--111.
 
12
 
13
 
14
 
15
 
16
17
 
18


Collaborative Colleagues:
Raphael Y. de Camargo: colleagues
Renato Cerqueira: colleagues
Fabio Kon: colleagues