|
ABSTRACT
With the increasing use of computing systems in such crucial areas as medicine and space, there has come a great need for computers that remain operational in spite of hardware failures. This paper provides a brief overview of several approaches to fault-tolerant computing. Five hardware redundancy techniques are reviewed: static, dynamic, hybrid, self-purging and the reconfiguration scheme. In addition, the advantages and disadvantages of error correcting codes and software fault-tolerant systems are outlined as well as bi-duplexed systems, alternating logic, fail-soft and shared logic systems. It is suggested that perhaps the best fault-tolerant system employ a combination of hardware redundant techniques and software protection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Avizienis, A. "Architecture of fault-tolerant computing systems", Digest of the Fifth International Symposium on Fault-Tolerant Computing, Paris, France, June 1975, pp. 3--16.
|
 |
3
|
|
| |
4
|
|
| |
5
|
von Neumann, J. "Probabilistic logics and the synthesis of reliable organisms from unreliable components", Automata Studies, C. E. Shannon and J. McCarthy, eds., pp. 43--98, Princeton University Press, New Jersey, 1956.
|
| |
6
|
|
 |
7
|
K. M. Chandy , C. V. Ramamoorthy , A. Cowan, A framework for hardware-software tradeoffs in the design of fault-tolerant computers, Proceedings of the December 5-7, 1972, fall joint computer conference, part I, December 05-07, 1972, Anaheim, California
[doi> 10.1145/1479992.1480000]
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
Cochi, B. "Reliability modeling and analysis of hybrid redundancy", Digest of the Fifth International Symposium on Fault-Tolerant Computing, pp. 75--80, Paris France, June 1975.
|
| |
12
|
|
| |
13
|
Losq, J. "Redundancy scheme for optimum multiple fault-tolerance", Technical Note no. 33, Digital Systems Laboratory, Stanford University, Stanford California, Jan. 1974.
|
| |
14
|
Su, S. Y. H. and E. DuCasse, "A reconfiguration scheme for tolerating multiple failures in digital systems", Proceedings of International Computer Symposium, 1975, pp. 216--222.
|
| |
15
|
|
| |
16
|
|
| |
17
|
O'Brien, F. "Rollback point insertion strategies", Proceedings of The Sixth International Symposium on Fault-Tolerant Computing, Pittsburgh, Pennsylvania, June 1976, pp. 138--142.
|
| |
18
|
"Special issues on Fault Tolerant Computing." IEEE Trans. Compt Nov. 1971, March 1973, July 1974, May 1975, June 1976.
|
| |
19
|
Courtois, B. "On Balancing Safety and Reliability of Hybrid and Biduplexed systems", Proceedings of the Sixth International Symposium on Fault-Tolerant Computing, Pittsburgh, Pennsylvania, June 1976, pp. 53--57.
|
| |
20
|
Reynolds, D. and G. Metze, "Fault Detection Capabilities of Alternating Logic", Proceedings of the Sixth International Symposium on Fault-Tolerant Computing, Pittsburgh, Pennsylvania, June 1976, pp. 157--162.
|
 |
21
|
Herbert B. Baskin , Barry R. Borgerson , Roger Roberts, PRIME: a modular architecture for terminal-oriented systems, Proceedings of the November 16-18, 1971, fall joint computer conference, November 16-18, 1971, Las Vegas, Nevada
[doi> 10.1145/1478873.1478929]
|
| |
22
|
|
| |
23
|
Rowe, L., M. Hopwood and D. Farber, "Software methods for achieving fail-soft behavior in the distributed computing system", Record of the 1973 Symposium on Computer Software Reliability, New York, April 1973, pp. 7--11.
|
| |
24
|
|
| |
25
|
Goldberg, J. "New problems in fault-tolerant computing", Digest of the Fifth International Symposium on Fault-Tolerant Computing, Paris, France, June 1975, pp. 29--34.
|
| |
26
|
Ogus, R. "Fault-tolerance of the iterative cell array switch for hybrid redundancy", Digest of the Third International Symposium on Fault-Tolerant Computing, Palo Alto, California, June 1973, pp. 107--112.
|
| |
27
|
Mine, H. and Y. Koga, "Basic properties and a construction method for fail-safe logical systems", IEEE Trans. Compt., Vol. EC-16, June 1967, pp. 282--289.
|
| |
28
|
|
| |
29
|
Mukai, Y. and Y. Tohma, "A masked-fault-free realization of fail-safe asynchronous sequential circuits", Proceedings of the Sixth International Symposium on Fault-Tolerant Computing, Pittsburgh, Pennsylvania, June 1976, pp. 69--74.
|
| |
30
|
|
| |
31
|
Swain, D. H. "Fail-safe synchronous sequential machines using modified on-set realizations", Digest of the Fourth International Symposium on Fault-Tolerant Computing, pp. (3--7)-(3--12), Urbana, Illinois, June 1974.
|
| |
32
|
A. Avizienis , G. C. Gilley , F. P. Mathur , D. A. Rennels , J. A. Rohr , D. K. Rubin, The STAR (Self-Testing And Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design, IEEE Transactions on Computers, v.20 n.11, p.1312-1321, November 1971
[doi> 10.1109/T-C.1971.223133]
|
| |
33
|
|
| |
34
|
|
| |
35
|
Wensley, J. H., K. N. Levitt, and P. G. Neumann, "A comparative study of architectures for fault-tolerance", Digest of the Fourth International Symposium on Fault-Tolerant Computing, Urbana, Illinois, June 1974, pp. (4--16)-(4--21).
|
|