ACM Home Page
Please provide us with feedback. Feedback
Resourceful systems for fault tolerance, reliability, and safety
Full text PdfPdf (3.36 MB)
Source ACM Computing Surveys (CSUR) archive
Volume 22 ,  Issue 1  (March 1990) table of contents
Pages: 35 - 68  
Year of Publication: 1990
ISSN:0360-0300
Author
Russell J. Abbott  The Aerospace Corp., Los Angeles, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 72,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/78949.78951
What is a DOI?

ABSTRACT

Above all, it is vital to recognize that completely guaranteed behavior is impossible and that there are inherent risks in relying on computer systems in critical environments. The unforeseen consequences are often the most disastrous [Neumann 1986]. Section 1 of this survey reviews the current state of the art of system reliability, safety, and fault tolerance. The emphasis is on the contribution of software to these areas. Section 2 reviews current approaches to software fault tolerance. It discusses why some of the assumptions underlying hardware fault tolerance do not hold for software. It argues that the current software fault tolerance techniques are more accurately thought of as delayed debugging than as fault tolerance. It goes on to show that in providing both backtracking and executable specifications, logic programming offers most of the tools currently used in software fault tolerance. Section 3 presents a generalization of the recovery block approach to software fault tolerance, called resourceful systems. Systems are resourceful if they are able to determine whether they have achieved their goals or, if not, to develop and carry out alternate plans. Section 3 develops an approach to designing resourceful systems based upon a functionally rich architecture and an explicit goal orientation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
AGRE, P. E., AND CHAPMAN, D. 1987. Pengi: An implementation of a theory of activity. In Proceedings o/AAAI 87. American Association for Artificial Intelligence, pp. 268-272.
 
4
 
5
 
6
AVlZIENIS, A. 1985a. Fault-tolerant computer systems. Class notes. Univ. of California, Los Angeles.
 
7
AVlZlENIS, A. 1985b. The N-version approach to fault-tolerant software. IEEE Trans. Softw. Eng. SE-11, 12 (Dec.), 1491-1501.
 
8
AVIZlENIS, A., AND KELLY, N. J. 1984. Fault Tolerance by design diversity:Concepts and experiments. IEEE Comput. 17, 8 {Aug.), 67-80.
 
9
AVIZIENIS, A., AND LAPRIE, J. C. 1986. Dependable computing: From concepts to design diversity. Proc. IEEE 74, 5 (May), 629-638.
 
10
 
11
BASTANI, F. B., AND YEN, I. L. 1985. Analysis of an inherently fault tolerant program. In Proceedings of COMPSAC 85. IEEE Computer Society, pp. 428-436.
 
12
BERLINER, H. 1980. Computer backgammon. Scientific American (June).
 
13
 
14
BOLOGNA, S., AND LEVESON, N. G. 1986. Special issue on reliability and safety in real-time process control. IEEE Trans. Softw. Eng. SE-12, 9 (Sept.), 877-996.
 
15
BOWEN, T. P., WIGLE, G. B., AND TSAI, J. T. 1985. Specification of software quality attributes. Tech. Rep. RADC-TR-85-37, Rome Air Development Center.
 
16
 
17
CHA, S. D., KNIGHT, J. C., LEVESON, N. G., AND SHIMEALL, T. J. 1987. An empirical study of software error detection using self-checks. In Digest of Papers FTCS-17:17th Annual Symposium on Fault Tolerant Computing. IEEE Computer Society, (July 1987), pp. 156-161.
 
18
 
19
DALE, C. J. 1982. Software reliability evaluation methods. Tech. Rep. ST-26750, British Aerospace Dynamics Group.
20
 
21
FLOYD, R. W. 1967. Assigning meaning to programs. In Proceedings of the Symposia on Applied Mathematics (Providence, R.I.), vol. 19, American Mathematics Society, pp. 19-32.
 
22
 
23
GEORGEFF, M. P., AND LANSKY, A. L. 1987. Reactive reasoning and planning. In Proceedings of AAAI 87. American Association for Artificial Intelligence, pp. 677-682.
 
24
GILLEY, G. C. 1987. Architectural design methods of transient fault protection. In Proceedings of AIAA Computers in Aerospace VI Conference. AIAA, pp. 78-82.
 
25
GOEL, A. L. 1985. Software reliability models: Assumptions, limitations, and applicability. IEEE Trans. Softw. Eng. S WE-11, 12 (Dec.), 1411-1423.
 
26
GOEL, A. L., AND BASTANI, F. B. 1985. Special issue on software and reliability. IEEE Trans. Softw. Eng. SE-11, 1 (Dec.), 1490-1577.
 
27
 
28
RAY, J. 1986. Why do computers stop and what can be done about it? In Proceedings of the 5th Symposium on Reliability and Distributed Software and Database Systems. IEEE Computer Society, pp. 3-12.
 
29
HAMMING, R. W.1950. Error detecting and error correcting codes. Bell Syst. Tech. J. 26, 4 (Apr.), 147-160.
30
 
31
 
32
JELINSKI, Z., AND MORANDA, P. B. 1972. Software reliability research. In Statistical Computer Performance Evaluation, Freiberger, Ed. Academic Press, New York, pp. 485-502.
 
33
KEILLER, P. A., LITTLEWOOD, B., MILLER, D. R., AND SOFER, A. 1983. Comparison of software reliability predictions. In Digest of the 13th International Symposium on Fault-Tolerant Computing, IEEE Computer Society, pp. 128-143.
 
34
 
35
KOVED, L., AND WALDBAUM, G. 1986. improving availability of software subsystems through online error detection. IBM Syst. J. 25, 1 (Jan.), 96-109.
36
37
 
38
LITTLEWOOD, B., AND VERRLL, J. L. 1973. A Bayesian reliability growth model for computer software. J. Roy. Stat. Soc. C 22 (Sept.), 332-346.
 
39
 
40
MERRIAM-WEBSTER. 1987. Webster's Ninth New Collegiate Dictionary. Merriam-Webster, Inc., Springfield, Mass.
 
41
 
42
MUSA, J. D. 1975. A theory of software reliability and its application. IEEE Trans. Softw. Eng., SE-1, 9 (Sept.), 312-327.
 
43
 
44
45
 
46
RANDELL, B. 1977. System structuring for software fault tolerance. In Current Trends in Programming Methodology, R. T. Yeh, Ed. Prentice-Hall, Englewood Cliffs, N.J., pp. 195-219.
 
47
RENNELS, D. A. 1984. Fault-tolerant computing-- Concepts and examples. IEEE Trans. Comput. C-33, 12 (Dec.), 1116-1129.
 
48
SEVIORA, R. E. 1987. Knowledge-based program debugging systems. IEEE Softw. 4, 3 (May), 20-32.
 
49
SHOOMAN, M. 1973. Operational testing and software reliability during program development. In Proceedings of the IEEE Symposium on Computer Software Reliability (New York), IEEE Computer Society, pp. 51-57.
 
50
STERLING, L., AND SHAPIRO, E. 1986. The Art of Prolog. MIT Press, Cambridge, Mass.
 
51
STOTT, C. B. 1987. Review of resilient computing systems: vol. I. IEEE Computer 20, 6 (June), 117- 118.
 
52
 
53
TAYLOR, J. R. 1982. An integrated approach to the treatment of design and specification errors in electronic systems and software. In Reliability in Electrical and Electronic Components and Systems, E. Lauger and J. Moltoft, Eds., North- Holland, Amsterdam.
 
54
TAYLOR, D. J., AND BLACK, J. P. 1982. Principles of data structure error correction. IEEE Trans. Comput. C-31, 7 (July), 602-608.
 
55
 
56
TAYLOR, D. J., MORGAN, D. E., AND BLACK, J. P. 1980. Redundancy in data structures: Improving software fault tolerance. IEEE Trans. Softw. Eng. SE-6, 1 (Nov.), 585-594.

CITED BY  12


REVIEW

"Bruce M. McMillin : Reviewer"

Abbott surveys various classical software techniques for ensuring system fault tolerance. He also presents the less widely known concept of “resourceful systems,” which adapt to changes in environment and preserve their functionali  more...