ACM Home Page
Please provide us with feedback. Feedback
Implementing fault-tolerant services using the state machine approach: a tutorial
Full text PdfPdf (2.10 MB)
Source ACM Computing Surveys (CSUR) archive
Volume 22 ,  Issue 4  (December 1990) table of contents
Pages: 299 - 319  
Year of Publication: 1990
ISSN:0360-0300
Author
Fred B. Schneider  Cornell Univ., Ithaca, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 77,   Downloads (12 Months): 647,   Citation Count: 165
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/98163.98167
What is a DOI?

ABSTRACT

The state machine approach is a general method for implementing fault-tolerant services in distributed systems. This paper reviews the approach and describes protocols for two different failure models—Byzantine and fail stop. Systems reconfiguration techniques for removing faulty components and integrating repaired components are also discussed.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
BERNSTEIN, A. J. 1985. A loosely coupled system for reliably storing data. IEEE Trans. Softw. Eng. SE-11, 5 (May), 446-454.
3
4
 
5
CRISTIAN, F., AGHILI, H., STRONG, H. R., AND DOLEV, D. 1985. Atomic broadcast: From simple message diffusion to Byzantine agreement. In Proceedings of the 15th International Conference on Fault-tolerant Computing (Ann Arbor, Mich., June 1985), IEEE Computer Society.
6
7
8
9
 
10
11
12
13
 
14
LAMPORT, L. 1979b. The implementation of reliable distributed multiprocess systems. Comput. Networks 2, 95-114.
15
 
16
LAMPORT, L. 1989. The part-time parliament. Tech. Rep. 49. Digital Equipment Corporation Systems Research Center, Palo Alto, Calif.
17
18
19
 
20
 
21
MARZULLO, K. 1989. Implementing fault-tolerant sensors. Tech. Rep. TR 89-997. Computer Science Dept., Cornell Univ., Ithaca, New York.
 
22
MARZULLO, K., AND SCHMUCK, F. 1988. Supplying high availability with a standard network file system. In Proceedings of the 8th International Conference on Distributed Computing Systems (San Jose, CA, June), IEEE Computer Society, pp. 447-455.
23
24
25
 
26
SCHNEIDER, F. B. 1980. Ensuring consistency on a distributed database system by use of distributed semaphores. In Proceedings of International Symposium on Distributed Data Bases (Paris, France, Mar.), INRIA, pp. 183-189.
27
28
 
29
 
30
SCHNEIDER, F. B. 1986. A paradigm for reliable clock synchronization. In Proceedings of the Advanced Seminar on Real-Time Local Area Networks (Bandol, France, Apr.), INRIA, pp. 85-104.
 
31
 
32
SIEWIOREK, D. P., AND SWARZ, R. S. 1982. The Theory and Practice of Reliable System Design. Digital Press, Bedford, Mass.
 
33
SKEEN, D. 1982. Crash recovery in a distributed database system. Ph.D. dissertation, Univ. of California at Berkeley, May.
 
34
STRONG, H. R., AND DOLEV, D. 1983. Byzantine agreement. Intellectual Leverage for the Information Society, Digest of Papers. (Compcon 83, IEEE Computer Society, Mar.), IEEE Computer Society, pp. 77-82.
 
35
WENSLEY, J., WENSKY, J. H., LAMPORT, L., GOLDBERG, J., GREEN, M. W., LEVITT, K. N., MELLIAR-SMITH, P. M., SHOSTAK, R. E., and WEINSTOCK, C. B. 1978. SIFT: Design and analysis of a fault-tolerant computer for aircraft control. Proc. IEEE 66, 10 (Oct.), 1240-1255.

CITED BY  165


REVIEW

"Valentin Cristea : Reviewer"

Distributed software structured in terms of clients and servers is considered. Replicas of a single server are executed on separate processors of a distributed system, and protocols coordinate client interactions with these replicas. The paper  more...