ACM Home Page
Please provide us with feedback. Feedback
Reliability modeling techniques for self-repairing computer systems
Full text PdfPdf (934 KB)
Source ACM Annual Conference/Annual Meeting archive
Proceedings of the 1969 24th national conference table of contents
Pages: 295 - 309  
Year of Publication: 1969
Authors
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 49,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/800195.805940
What is a DOI?

ABSTRACT

This paper develops techniques for generating and using mathematical models applicable to architectural evaluation of the tradeoffs involved in designing self-repairing highly reliable computers for long missions. These systems must use standby sparing and their reliability is shown to be extremely sensitive to small variations in a new design parameter, the coverage, c, defined as the probability of system recovery given the existence of a failure. Interactive terminal calculations show c to be the single most important parameter in high-reliability system design. Changing the coverage from 1 to .98 can result in orders of magnitude change in system mission time with a specified reliability. Most techniques for increasing system reliability (e.g. adding more spares) are shown to be futile in the face of an inadequate .99 coverage. Adding checking, diagnostics, etc. to improve failure coverage is shown to be the most advantageous technique by examples of system tradeoff evaluation. This mandates extensive application of modeling techniques throughout all computer system design phases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. P. Roth, W. G. Bouricius, W. C. Carter and P. R. Schneider, Phase II of an Architectural Study for a Self-Repairing Computer, SAMSO TR-67-106, Nov. 1967.
 
2
A. Avizienis, "Design of Fault-Tolerant Computers", FJCC, Vol. 31, pp. 733-743, 1967.
 
3
C. W. Churchman, R. L. Ackoff and E. L. Arnoff, Introduction to Operations Research, Chapter 1, Wiley, New York, 1957.
 
4
J. K. Knox-Seith, A Redundancy Technique for Improving the Reliability of Digital Systems, Stanford Electronics Laboratory, TR No. 4816-1, Dec. 1963.
 
5
W. G. Bouricius, W. C. Carter, J. P. Roth and P. R. Schneider, Investigations in the Design of an Automatically Repaired Computer, First Annual IEEE Computer Conference, Sept, 1967.
 
6
J. von Neumann, "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components", Automata Studies, Annals of Mathematics, No.34, pp. 43-98, Princeton, 1956.
 
7
J. G. Tryon, Quadded Logic, Redundancy Techniques for Computing Systems, Spartan Books, 1962.
 
8
P. O. Nerber, "Power Off Time Impact on Reliability Estimates", IEEE Int. Convention Rec., Part 10, pp. 1-5, March 22-26, New York.
 
9
A. D. Falkoff and K. E. Iverson, The APL Terminal System, Instructions for Operation, IBM Watson Research Center, Yorktown Heights, N. Y., March 1966.
 
10
R. Courant, Differential and Integral Calculus, Vol. 1, P. 330, Nordemann publishing Co., 1937.
 
11
W. S. Feller, An Introduction to Probability Theory and Its Application, Volume I, Wiley, New York, 1957.
 
12
W. C. Carter and P. R. Schneider, Design of Dynamically Checked Computers, IFIPS '68, Edinburgh, Scotland.
 
13
W. G. Bouricius, W. C. Carter, K. A. Duke, J. P. Roth and P. R. Schneider, Interactive Design of Self-Testing Circuitry, Purdue Centennial Symp. on Information Processing, May 1969.

CITED BY  19

Collaborative Colleagues:
W. G. Bouricius: colleagues
W. C. Carter: colleagues
P. R. Schneider: colleagues