|
ABSTRACT
Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execution in multi-threaded code is not obvious, however. For a thread to correctly rexecute a region of code, it must ensure that all other threads which have witnessed its unwanted effects within that region are also reverted to a meaningful earlier state. If not done properly, data inconsistencies and other undesirable behavior may result. however, automatically determining what constitutes a consistent global checkpoint is not straightforward since thread interactions are a dynamic property of the program.In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML (CML) that can be used to recover from transient faults. We introduce a new linguistic abstraction called stabilizers that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints. Safe global states are computed through lightweight monitoring of communication events among threads (e.g. message-passing operations or updates to shared variables).Our experimental results on several realistic, multithreaded, server-style CML applications, including a web server and a windowing toolkit, show that the overheads to use stabilizers are small, and lead us to conclude that they are a viable mechanism for defining safe checkpoints in concurrent functional programs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
Greg Bronevetsky , Daniel Marques , Keshav Pingali , Paul Stodghill, Automated application-level checkpointing of MPI programs, Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 11-13, 2003, San Diego, California, USA
|
 |
5
|
Greg Bronevetsky , Daniel Marques , Keshav Pingali , Peter Szwed , Martin Schulz, Application-level checkpointing for shared memory programs, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
6
|
Roberto Bruni , Hernán Melgratti , Ugo Montanari, Theoretical foundations for compensations in flow composition languages, Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.209-220, January 12-14, 2005, Long Beach, California, USA
|
| |
7
|
G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot - A Technique for Cheap Recovery. In 6th Symposium on Operating Systems Design and Implementation, San Francisco, California, 2004.
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
Jim Gray and Andreas Reuter. Transaction Processing. Morgan-Kaufmann, 1993.
|
 |
17
|
Tim Harris , Keir Fraser, Language support for lightweight transactions, Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, October 26-30, 2003, Anaheim, California, USA
|
 |
18
|
Tim Harris , Simon Marlow , Simon Peyton-Jones , Maurice Herlihy, Composable memory transactions, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
[doi> 10.1145/1065944.1065952]
|
 |
19
|
Maurice Herlihy , Victor Luchangco , Mark Moir , William N. Scherer, III, Software transactional memory for dynamic-sized data structures, Proceedings of the twenty-second annual symposium on Principles of distributed computing, p.92-101, July 13-16, 2003, Boston, Massachusetts
[doi> 10.1145/872035.872048]
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
 |
24
|
K. Li , J. F. Naughton , J. S. Plank, Real-time, concurrent checkpoint for parallel programs, Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming, p.79-88, March 14-16, 1990, Seattle, Washington, United States
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
 |
28
|
|
 |
29
|
|
 |
30
|
|
 |
31
|
Adam Welc , Suresh Jagannathan , Antony Hosking, Safe futures for Java, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
| |
32
|
Adam Welc, Suresh Jagannathan, and Antony L. Hosking. Transactional Monitors for Concurrent Objects. In European Conference on Object-Oriented Programming, pages 519--542, 2004.
|
CITED BY 6
|
|
Matthew Fluet , Mike Rainey , John Reppy , Adam Shaw , Yingqi Xiao, Manticore: a heterogeneous parallel language, Proceedings of the 2007 workshop on Declarative aspects of multicore programming, p.37-44, January 16-16, 2007, Nice, France
|
|
|
Matthew Fluet , Nic Ford , Mike Rainey , John Reppy , Adam Shaw , Yingqi Xiao, Status report: the manticore project, Proceedings of the 2007 workshop on Workshop on ML, October 05-05, 2007, Freiburg, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.3
Language Constructs and Features
Subjects:
Concurrent programming structures
Additional Classification:
D.
Software
D.1
PROGRAMMING TECHNIQUES
D.1.3
Concurrent Programming
D.3
PROGRAMMING LANGUAGES
D.3.1
Formal Definitions and Theory
Subjects:
Semantics
General Terms:
Design,
Experimentation,
Languages,
Measurement,
Performance,
Reliability
Keywords:
checkpointing,
concurrent ML,
concurrent programming,
error recovery,
exception handling,
transactions
|