ACM Home Page
Please provide us with feedback. Feedback
Reliable communication in the presence of failures
Full text PdfPdf (2.62 MB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 5 ,  Issue 1  (February 1987) table of contents
Pages: 47 - 76  
Year of Publication: 1987
ISSN:0734-2071
Authors
Kenneth P. Birman  Cornell Univ., Ithaca, NY
Thomas A. Joseph  Cornell Univ., Ithaca, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 210,   Citation Count: 155
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/7351.7478
What is a DOI?

ABSTRACT

The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistent orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols in the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
6
 
7
CRISTIAN, F., AGHILI, H., STRONG, R., AND DOLEV, D. Atomic broadcast: From simple message diffusion to Byzantine agreement. IBM Tech. Rep. RJ 4540 (48668), Oct. 1984.
8
9
 
10
11
12
13
14
 
15
 
16
SKEEN, D. Crash recovery in distributed database systems. Ph.D. dissertation, Dept. of Electrical Engineering and Computer Science, Univ. of California, Berkeley, 1980.
17

CITED BY  155


REVIEW

"Andrew Robert Huber : Reviewer"

The premise of this paper is that message orderings should be included in the communications layer of a distributed system. This approach is intended to maximize concurrency at the communications level, yet allow processes to determine (when des  more...

Collaborative Colleagues:
Kenneth P. Birman: colleagues
Thomas A. Joseph: colleagues