|
ABSTRACT
The state-of-art in software validation as well as the continuing growth of the size and complexity of software subsystems, makes extra costs paid for software error tolerance more than justified. A program in which software redundancy is incorporated i.e. a program in which procedures for run-time validation and recovery are explicitly specified, is generally called a failure-tolerant program. One problem in failure-tolerant programming, which could be particularly serious in real-time computing environments, is the program execution time increased due to incorporation of validation and recovery procedures. This paper introduces an approach to the solution, called the failure-tolerant parallel programming. The essence of this approach is to maximally overlap main-stream computation with redundant computation oriented for validation and recovery. Subsequently, a model system architecture tailored for efficient execution of failure-tolerant parallel programs is described. It is of highly general and modular nature and contains a novel memory subsystem named the duplex memory. Directions of further researches on program structuring and expansion of the model architecture are also indicated.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Avizienis , G. C. Gilley , F. P. Mathur , D. A. Rennels , J. A. Rohr , D. K. Rubin, The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design, IEEE Transactions on Computers, v.20 n.11, p.1312-1321, November 1971
[doi> 10.1109/T-C.1971.223133]
|
 |
2
|
|
| |
3
|
|
| |
4
|
Chandy, K. M. and C. V. Ramamoorthy, "Rollback and Recovery Strategies for Computer Programs," IEEE Trans, on Comp., February 1972, pp. 137--146.
|
| |
5
|
Chandy, K. M. et al., "Analytic Models for Rollback and Recovery Strategies in Data Base Systems," IEEE Trans, on Software Engr., March 1975, pp. 100--110.
|
| |
6
|
|
| |
7
|
Chang, H. Y. et al., Fault Diagnosis of Digital Systems, Wiley-Interscience, 1970.
|
| |
8
|
Connet, J. R. et al., "Software Defenses in Real-Time Control Systems," Digest of the 1972 Int'l Symp. on Fault-Tolerant Computing, pp. 94--99.
|
 |
9
|
|
| |
10
|
Dijkstra, E. W., "Structured Programming," in J. N. Buxton and B. Randell (eds.), Software Engineering Techniques, report on a Conf. sponsored by the NATO Science Committee, Rome, Italy, 1969, pp. 84--88.
|
| |
11
|
Elemendorf, W. R., "Fault-Tolerant Programming," Digest of the 1972 Int'l Symp. on Fault-Tolerant Computing, pp. 79--83.
|
 |
12
|
|
| |
13
|
Hetzel, W. C. (ed), Program Test Methods, Prentice-Hall, 1973.
|
| |
14
|
|
| |
15
|
Kennedy, P. J. and T. M. Quinn, "Recovery Strategies in the No. 2 Electronic Switching System," Digest of the 1972 Int'l Symp. on Fault-Tolerant Computing, pp. 165--169.
|
 |
16
|
|
| |
17
|
Kopetz, H., "Software Redundancy in Real-Time Systems," Proc. IFIP Congress 1974, pp. 182--186.
|
| |
18
|
Pierce, W. H., Failure-Tolerant Computer Design, Academic Press, 1965.
|
 |
19
|
|
| |
20
|
|
| |
21
|
Ramamoorthy, C. V. and K. H. Kim, "Software Monitors Aiding Systematic Testing and their Optimal Placement," Proc. 1st Nat'l Conf. on Software Engr., pp. 21--26.
|
| |
22
|
Randell, B., "System Structure for Software Fault Tolerance," IEEE Trans. on Software Engr., June 1975, pp. 220--232.
|
| |
23
|
Rohr, J. A., "STAREX-Self-Repair Routines: Software Recovery in the JPL-STAR Computer," Digest of the 1973 Int'l Symp. on Fault-Tolerant Computing, pp. 11--16.
|
| |
24
|
|
| |
25
|
Short, R. A., "The Attainment of Reliable Digital Systems Through the Use of Redundancy---A Survey," IEEE Comp. Group News, March 1968, pp. 2--17.
|
| |
26
|
Von Neumann, J., "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components," in Automata Studies, Annals of Math. No. 34, Princeton Univ. Press, 1956, pp. 43--98.
|
 |
27
|
|
|