|
ABSTRACT
The state machine approach is a general method for implementing fault-tolerant services in distributed systems. This paper reviews the approach and describes protocols for two different failure models—Byzantine and fail stop. Systems reconfiguration techniques for removing faulty components and integrating repaired components are also discussed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
BERNSTEIN, A. J. 1985. A loosely coupled system for reliably storing data. IEEE Trans. Softw. Eng. SE-11, 5 (May), 446-454.
|
 |
3
|
|
 |
4
|
|
| |
5
|
CRISTIAN, F., AGHILI, H., STRONG, H. R., AND DOLEV, D. 1985. Atomic broadcast: From simple message diffusion to Byzantine agreement. In Proceedings of the 15th International Conference on Fault-tolerant Computing (Ann Arbor, Mich., June 1985), IEEE Computer Society.
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
Ajei Gopal , Ray Strong , Sam Toueg , Flaviu Cristian, Early-delivery atomic broadcast, Proceedings of the ninth annual ACM symposium on Principles of distributed computing, p.297-309, August 22-24, 1990, Quebec City, Quebec, Canada
[doi> 10.1145/93385.93430]
|
| |
10
|
|
 |
11
|
Joseph Y. Halpern , Barbara Simons , Ray Strong , Danny Dolev, Fault-tolerant clock synchronization, Proceedings of the third annual ACM symposium on Principles of distributed computing, p.89-102, August 27-29, 1984, Vancouver, British Columbia, Canada
[doi> 10.1145/800222.806739]
|
 |
12
|
|
 |
13
|
|
| |
14
|
LAMPORT, L. 1979b. The implementation of reliable distributed multiprocess systems. Comput. Networks 2, 95-114.
|
 |
15
|
|
| |
16
|
LAMPORT, L. 1989. The part-time parliament. Tech. Rep. 49. Digital Equipment Corporation Systems Research Center, Palo Alto, Calif.
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
MARZULLO, K. 1989. Implementing fault-tolerant sensors. Tech. Rep. TR 89-997. Computer Science Dept., Cornell Univ., Ithaca, New York.
|
| |
22
|
MARZULLO, K., AND SCHMUCK, F. 1988. Supplying high availability with a standard network file system. In Proceedings of the 8th International Conference on Distributed Computing Systems (San Jose, CA, June), IEEE Computer Society, pp. 447-455.
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
SCHNEIDER, F. B. 1980. Ensuring consistency on a distributed database system by use of distributed semaphores. In Proceedings of International Symposium on Distributed Data Bases (Paris, France, Mar.), INRIA, pp. 183-189.
|
 |
27
|
|
 |
28
|
|
| |
29
|
M. W. Alford , J. P. Ansart , G. Hommel , L. Lamport , B. Liskov , G. P. Mullery , F. B. Schneider, Distributed systems: methods and tools for specification. An advanced course, Springer-Verlag New York, Inc., New York, NY, 1985
|
| |
30
|
SCHNEIDER, F. B. 1986. A paradigm for reliable clock synchronization. In Proceedings of the Advanced Seminar on Real-Time Local Area Networks (Bandol, France, Apr.), INRIA, pp. 85-104.
|
| |
31
|
|
| |
32
|
SIEWIOREK, D. P., AND SWARZ, R. S. 1982. The Theory and Practice of Reliable System Design. Digital Press, Bedford, Mass.
|
| |
33
|
SKEEN, D. 1982. Crash recovery in a distributed database system. Ph.D. dissertation, Univ. of California at Berkeley, May.
|
| |
34
|
STRONG, H. R., AND DOLEV, D. 1983. Byzantine agreement. Intellectual Leverage for the Information Society, Digest of Papers. (Compcon 83, IEEE Computer Society, Mar.), IEEE Computer Society, pp. 77-82.
|
| |
35
|
WENSLEY, J., WENSKY, J. H., LAMPORT, L., GOLDBERG, J., GREEN, M. W., LEVITT, K. N., MELLIAR-SMITH, P. M., SHOSTAK, R. E., and WEINSTOCK, C. B. 1978. SIFT: Design and analysis of a fault-tolerant computer for aircraft control. Proc. IEEE 66, 10 (Oct.), 1240-1255.
|
CITED BY 165
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dahlia Malkhi , Michael Reiter , Rebecca Wright, Probabilistic quorum systems, Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing, p.267-273, August 21-24, 1997, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alan Fekete , Nancy Lynch , Alex Shvartsman, Specifying and using a partitionable group communication service, Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing, p.53-62, August 21-24, 1997, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ajei Gopal , Ray Strong , Sam Toueg , Flaviu Cristian, Early-delivery atomic broadcast, Proceedings of the ninth annual ACM symposium on Principles of distributed computing, p.297-309, August 22-24, 1990, Quebec City, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
John A. Hine , Walt Yao , Jean Bacon , Ken Moody, An architecture for distributed OASIS services, IFIP/ACM International Conference on Distributed systems platforms, p.104-120, April 03-07, 2000, New York, New York, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael K. Reiter , Matthew K. Franklin , John B. Lacy , Rebecca N. Wright, The Ω key management service, Proceedings of the 3rd ACM conference on Computer and communications security, p.38-47, March 14-15, 1996, New Delhi, India
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jian Yin , Jean-Philippe Martin , Arun Venkataramani , Lorenzo Alvisi , Mike Dahlin, Separating agreement from execution for byzantine fault tolerant services, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mehul A. Shah , Joseph M. Hellerstein , Eric Brewer, Highly available, fault-tolerant, parallel dataflows, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carole Delporte-Gallet , Hugues Fauconnier , Rachid Guerraoui , Vassos Hadzilacos , Petr Kouznetsov , Sam Toueg, The weakest failure detectors to solve certain fundamental problems in distributed computing, Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, July 25-28, 2004, St. John's, Newfoundland, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abdelkrim Beloued , Jean-Marie Gilliot , Maria-Teresa Segarra , Françoise André, Dynamic data replication and consistency in mobile environments, Proceedings of the 2nd international doctoral symposium on Middleware, p.1-5, November 28-December 02, 2005, Grenoble, France
|
|
|
|
|
|
|
|
|
Mahesh Kallahalla , Erik Riedel , Ram Swaminathan , Qian Wang , Kevin Fu, Plutus: Scalable Secure File Sharing on Untrusted Storage, Proceedings of the 2nd USENIX Conference on File and Storage Technologies, March 31-31, 2003, San Francisco, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arash Baratloo , P. Emerald Chung , Yennun Huang , Sampath Rangarajan , Shalini Yajnik, Filterfresh: hot replication of java RMI server objects, Proceedings of the 4th conference on USENIX Conference on Object-Oriented Technologies and Systems (COOTS), p.5-5, April 27-30, 1998, Santa Fe, New Mexico
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Flavio Junqueira , Ranjita Bhagwan , Keith Marzullo , Stefan Savage , Geoffrey M. Voelker, The phoenix recovery system: rebuilding from the ashes of an internet catastrophe, Proceedings of the 9th conference on Hot Topics in Operating Systems, p.13-13, May 18-21, 2003, Lihue, Hawaii
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tushar D. Chandra , Robert Griesemer , Joshua Redstone, Paxos made live: an engineering perspective, Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, p.398-407, August 12-15, 2007, Portland, Oregon, USA
|
|
|
Chi Ho , Robbert van Renesse , Mark Bickford , Danny Dolev, Nysiad: practical protocol transformation to tolerate Byzantine failures, Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, p.175-188, April 16-18, 2008, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kiron Vijayasankar , Gopalan Sivathanu , Swaminathan Sundararaman , Erez Zadok, Exploiting type-awareness in a self-recovering disk, Proceedings of the 2007 ACM workshop on Storage security and survivability, October 29-29, 2007, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
James Cowling , Daniel Myers , Barbara Liskov , Rodrigo Rodrigues , Liuba Shrira, HQ replication: a hybrid quorum protocol for byzantine fault tolerance, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Maysam Yabandeh , Nikola Knezevic , Dejan Kostic , Viktor Kuncak, CrystalBall: predicting and preventing inconsistencies in deployed distributed systems, Proceedings of the 6th USENIX symposium on Networked systems design and implementation, p.229-244, April 22-24, 2009, Boston, Massachusetts
|
|
|
Benjamin Wester , James Cowling , Edmund B. Nightingale , Peter M. Chen , Jason Flinn , Barbara Liskov, Tolerating latency in replicated state machines through client speculation, Proceedings of the 6th USENIX symposium on Networked systems design and implementation, p.245-260, April 22-24, 2009, Boston, Massachusetts
|
|
|
Marcos Kawazoe Aguilera , Idit Keidar , Dahlia Malkhi , Alexander Shraer, Dynamic atomic storage without consensus, Proceedings of the 28th ACM symposium on Principles of distributed computing, August 10-12, 2009, Calgary, AB, Canada
|
REVIEW
"Valentin Cristea : Reviewer"
Distributed software structured in terms of clients and servers is
considered. Replicas of a single server are executed on separate
processors of a distributed system, and protocols coordinate client
interactions with these replicas. The paper
more...
|