|
ABSTRACT
We present a formal approach to implement fault-tolerance in real-time embedded systems. The initial fault-intolerant system consists of a set of independent periodic tasks scheduled onto a set of fail-silent processors connected by a reliable communication network. We transform the tasks such that, assuming the availability of an additional spare processor, the system tolerates one failure at a time (transient or permanent). Failure detection is implemented using heartbeating, and failure masking using checkpointing and rollback. These techniques are described and implemented by automatic program transformations on the tasks' programs. The proposed formal approach to fault-tolerance by program transformations highlights the benefits of separation of concerns. It allows us to establish correctness properties and to compute optimal values of parameters to minimize fault-tolerance overhead. We also present an implementation of our method, to demonstrate its feasibility and its efficiency.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Aggarwal, A. and Gupta, D. 2002. Failure detectors for distributed systems. Tech. rep., Indian Institute of Technology, Kanpur, India. http://resolute.ucsd.edu/diwaker/publications/ds.pdf.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Baille, G., Garnier, P., Mathieu, H., and Pissard-Gibollet, R. 1999. Le CyCab de l'Inria Rhne-Alpes. Tech. rep. 0229, Inria, Rocquencourt, France.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
Dumitrescu, E., Girault, A., Marchand, H., and Rutten, E. 2007. Optimal discrete controller synthesis for modeling fault-tolerant distributed systems. In Workshop on Dependable Control of Discrete Systems (DCDS'07). Cachan, France. IFAC, New York. 23--28.
|
| |
15
|
Dumitrescu, E., Girault, A., and Rutten, E. 2004. Validating fault-tolerant behaviors of synchronous system specifications by discrete controller synthesis. In Workshop on Discrete Event Systems (WODES'04). Reims. France. IFAC, New York.
|
 |
16
|
|
| |
17
|
Girault, A. and Rutten, E. 2004. Discrete controller synthesis for fault-tolerant distributed systems. In Proceedings of the International Workshop on Formal Methods for Industrial Critical Systems (FMICS'04). Electronic Notes in Theoretical Computer Science, vol. 133, Elsevier Science, New York. 81--100.
|
| |
18
|
Girault, A. and Yu, H. 2006. A flexible method to tolerate value sensor failures. In Proceedings of the International Conference on Emerging Technologies and Factory Automation (ETFA'06). Prague, Czech Republic. IEEE, New York. 86--93.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Kalaiselvi, S. and Rajaraman, V. 2000. A survey of checkpointing algorithms for parallel and distributed computers. Sadhana 25, 5, 489--510.
|
| |
23
|
|
| |
24
|
|
| |
25
|
Lisper, B. 2006. Trends in timing analysis. In Proceedings of the IFIP Working Conference on Distributed and Parallel Embedded Systems (DIPES'06). Braga, Portugal. Springer, Berlin, 85--94.
|
 |
26
|
|
| |
27
|
Liu, Z. and Joseph, M. 1992. Transformation of programs for fault-tolerance. Formal Aspects Comput. 4, 5, 442--469.
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
Sekhavat, S. and Hermosillo, J. 2000. The Cycab robot: A differentially flat system. In Proceedings of the IEEE Conference on Intelligent Robots and Systems (IROS'00). Takamatsu, Japan. IEEE, Los Alamitos, CA.
|
| |
37
|
|
| |
38
|
|
| |
39
|
|
|