|
ABSTRACT
A new mechanism for constructing highly available distributed programs is described. It combines remote procedure call with replication of program modules for fault tolerance. The set of replicas of a module is called a troupe. In a program constructed from troupes, what appears to the programmer as a single inter-module procedure call results in a replicated procedure call. A distributed program constructed in this way will continue to function as long as at least one member of each troupe survives. The semantics of replicated procedure calls and troupes are defined and algorithms are presented that support these semantics.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. A. Lee , T. Anderson , J. C. Laprie , A. Avizienis , H. Kopetz, Fault Tolerance: Principles and Practice, Springer-Verlag New York, Inc., Secaucus, NJ, 1990
|
 |
2
|
|
 |
3
|
|
| |
4
|
David Reeves Boggs. Internet Broadcasting. Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Xerox PARC report number CSL-83-3, October 1983.
|
 |
5
|
Anita Borg , Jim Baumbach , Sam Glazer, A message system supporting fault tolerance, Proceedings of the ninth ACM symposium on Operating systems principles, p.90-99, October 10-13, 1983, Bretton Woods, New Hampshire, United States
|
| |
6
|
Liming Chen and Algirdas Avizienis. N-version programming: A fault-tolerance approach to reliability of software operation. Digest of Papers, FTCS-8: 8th Annual International Conference on Fault-Tolerant Computing, June 1978, pages 3-9.
|
| |
7
|
|
| |
8
|
Eric C. Cooper. Mechanisms for Constructing Reliable Distributed Programs. Ph.D. dissertation, Computer Science Division, University of California, Berkeley, in preparation.
|
 |
9
|
|
 |
10
|
|
| |
11
|
Digital Equipment Corporation, Intel Corporation, and Xerox Corporation. The Ethernet: A Local Area Network. September 1980.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
Per Gunningberg. Voting and redundancy management implemented by protocols in distributed systems. Digest of Papers, FTCS-13: 13th International Symposium on Fault-Tolerant Computing, June 1983, pages 182-185.
|
 |
16
|
|
| |
17
|
William Joy, Eric Cooper, Robert Fabry, Samuel Leffier, Kirk McKusick, and David Mosher. 4.2BSD System Manual. Computer Systems Research Group, Computer Science Division, University of California, Berkeley, July 1983.
|
 |
18
|
|
| |
19
|
Leslie Lamport. The implementation of reliable distributed multiprocess systems. Computer Networks2, 2 (May 1978), pages 95-114.
|
 |
20
|
|
| |
21
|
Butler W. Lampson and Howard E. Sturgis. Crash Recovery in a Distributed Data Storage System, Unpublished paper, Computer Science Laboratory, Xerox PARC, draft of June 1979.
|
| |
22
|
Butler W. Lampson. Replicated Commit. Unpublished paper, Computer Science Laboratory, Xerox PARC, January 1981.
|
 |
23
|
|
| |
24
|
|
| |
25
|
R.E. Lyons and W. Vanderkulk. The use of triple-modular redundancy to improve computer reliability. IBM Journal of Research and Development6, 2 (April 1962), pages 200-209.
|
| |
26
|
J. Eliot B. Moss. Nested transactions and reliable distributed computing. Proceedings of the 2nd Symposium on Reliability in Distributed Software and Database Systems, July 1982, pages 33-39.
|
| |
27
|
Bruce Jay Nelson. Remote Procedure Call. Ph.D. dissertation, Computer Science Department, Carnegic-Mellon University, CMU report number CMU-CS-81-119, Xerox PARC report number CSL-81-9, May 1981.
|
| |
28
|
Derek C. Oppen and Yogen K. Dalal. The Clearinghouse: A Decentralized Agent for Locating Named Objects in a Distributed Environment. Xerox Office Products Division report number OPD-T8103, October 1981.
|
| |
29
|
W.H. Pierce. Adaptive vote-takers improve the use of redundancy. In Redundancy Techniques for Computing Systems, ed. Richard H. Wilcox and William C. Mann, Spartan Books, Washington, D.C., 1962, pages 229-250.
|
| |
30
|
Jon Postel. User Datagram Protocol. Information Sciences Institute, University of Southern California, RFC 768, August 1980.
|
| |
31
|
Jon Postel. Internet Protocol. Information Sciences Institute, University of Southern California, RFC 791, September 1981.
|
 |
32
|
|
 |
33
|
|
 |
34
|
|
| |
35
|
Fred B. Schneider. Fail-stop processors. Digest of Papers, Spring COMPCON 83: 26th IEEE Computer Society International Conference, February 1983, pages 66-70.
|
 |
36
|
|
| |
37
|
J. von Neumann. Probabilistic logics and the synthesis of reliable organisms from unreliable components. In Automata Studies, ed. C. E. Shannon and J. McCarthy, Princeton University Press, 1956, pages 43-98.
|
| |
38
|
John H. Wensley. SIFT—Software implemented fault tolerance. Proceedings of the AFIPS 1972 Fall Joint Computer Conference, Volume 41, Part 1, December 1972. pages 243-253.
|
| |
39
|
Xerox Corporation. Courier: The Remote Procedure Call Protocol. Xerox System Integration Standard 038112, December 1981.
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gregory R. Andrews , Michael Coffin , Irving Elshoff , Kelvin Nilson , Gregg Townsend , Ronald A. Olsson , Titus Purdin, An overview of the SR language and implementation, ACM Transactions on Programming Languages and Systems (TOPLAS), v.10 n.1, p.51-86, Jan. 1988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|