ACM Home Page
Please provide us with feedback. Feedback
Specifying and using a partitionable group communication service
Full text PdfPdf (474 KB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 19 ,  Issue 2  (May 2001) table of contents
Pages: 171 - 216  
Year of Publication: 2001
ISSN:0734-2071
Authors
Alan Fekete  Dept. of Computer Science, University of Sydney, Sydney, Australia
Nancy Lynch  Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Alex Shvartsman  Dept. Computer Science and Engineering, University of Connecticut, Storrs, CT and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 80,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/377769.377776
What is a DOI?

ABSTRACT

Group communication services are becoming accepted as effective building blocks for the construction of fault-tolerant distributed applications. Many specifications for group communication services have been proposed. However, there is still no agreement about what these specifications should say, especially in cases where the services are partitionable, i.e., where communication failures may lead to simultaneous creation of groups with disjoint memberships, such that each group is unware of the existence of any other group. In this paper, we present a new, succinct specification for a view-oriented partitionable group communication service. The service associates each message with a particular view of the group membership. All send and receive events for a message occur within the associated view. The service provides a total order on the messages within each view, and each processor receives a prefix of this order. Our specification separates safety requirements from performance and fault-tolerance requirements. The safety requirements are expressed by an abstract, global state machine. To present the performance and fault-tolerance requirements, we include failure-status input actions in the specification; we then give properties saying that consensus on the view and timely message delivery are guaranteed in an execution provided that the execution stabilizes to a situation in which the failure-status stops changing and corresponds to consistently partioned system. Because consensus is not required in every execution, the specification is not subject to the existing impossibility results for partionable systems. Our specification has a simple implementation, based on the membership algorithm of Christian and Schmuck. We show the utility of the specification by constructing an ordered-broadcast application, using an algorithm (based on algorithms of Amir, Dolev, Keidar, and others) that reconciles information derived from different instantiations of the group. The application manages the view-change activity to build a shared sequence of messages, i.e., the per-view total orders of the group service are combined to give a universal total order. We prove the correctness and analyze the performance and fault-tolerance of the resulting application.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
AMIR, Y., CHOKLER,G.V.,DOLEV, D., AND VITENBERG, R. 1997. Efficient state transfer in partitionable environments. In Proceedings of 2nd European Research Seminar on Advances in Distributed Systems (ERSADS'97, Zinal, Switzerland, Mar.). 183-192.
 
2
AMIR, Y., DOLEV, D., KRAMER, S., AND MALKI, D. 1992. Transis: A communication subsystem for high availability. In Proceedings of the 22nd IEEE Symposium on Fault-Tolerant Computing (FTCS, Boston, MA, July). IEEE Press, Piscataway, NJ, 76-84.
 
3
AMIR, Y., DOLEV, D., MELLIAR-SMITH, P., AND MOSER, L. 1994. Robust and efficient replication using group communication. 94-20.
 
4
AMIR, Y., MOSER, L., MELLIAR-SMITH, P., AGRAWAL, D., AND CIARFELLA, P. 1993. Fast message ordering and membership using a logical token-passing ring. In Proceedings of 13th IEEE International Conference on Distributed Computing Systems (May). IEEE Press, Piscat-away, NJ, 551-560.
5
 
6
 
7
 
8
 
9
 
10
 
11
 
12
13
14
 
15
CHEINER,O.AND SHVARTSMAN, A. A. 1999. Implementing and evaluating an eventually-serializable data service as a distributed system building block. In Networks in Distributed Computing. DIMACS Series on Discrete Mathematics and Theoretical Computer Science, vol. 45. 43-71.
 
16
COHEN, J., ED. 1996. Commun. ACM. 39, 4 (Apr.).
17
 
18
 
19
CRISTIAN,F.AND SCHMUCK, F. 1995. Agreeing on processor group membership in asynchronous distributed systems. CSE95-428.
20
 
21
22
 
23
DOLEV, D., MALKI, D., AND STRONG, R. 1994. A framework for partitionable membership service. TR94-6.
 
24
DOLEV, S., SEGALA, R., AND SHVARTSMAN, A. 1999. Dynamic load balancing with group communication. In Proceedings of the Sixth International Colloquium on Structural Information and Communication Complexity.
 
25
 
26
 
27
 
28
FRIEDMAN,R.AND VAN RENESSE, R. 1995. Strong and weak virtual synchrony in Horus. Tech Rep. TR-95-1537. Department of Computer Science, Cornell University, Ithaca, NY.
 
29
 
30
 
31
 
32
 
33
HILTUNEN,M.AND SCHLICHTING, R. 1995. Properties of membership services. In Proceedings of the 2nd IEEE Symposium on Autonomous Decentralized Systems (Phoenix, AZ, Apr.). IEEE Press, Piscataway, NJ, 200-207.
 
34
JAHANIAN, F., FAKHOURI, S., AND RAJKUMAR, R. 1993. Processor group membership protocols: Specification, design and implementation. In Proceedings of the 12th IEEE Symposium on Reliable Distributed Systems (Princeton, NJ, Oct.). IEEE Press, Piscataway, NJ, 2-11.
 
35
KEIDAR, I. 1994. A highly available paradigm for consistent object replication. Master's Thesis. See also TR CS95-5 available at http://www.cs.huji.ac.il/ztransis/publications.html.
36
37
 
38
 
39
LYNCH,N.A.AND TUTTLE, M. R. 1989. An introduction to input/output automata. CWI Q. 2, 3, 219-246.
 
40
 
41
 
42
MALLOTH,C.AND SCHIPER, A. 1995. View synchronous communication in large scale networks. In Proceedings of the 2nd Open Workshop on ESPRIT Project BROADCAST (July).
 
43
MISHRA, S., PETERSON,L.L.,AND SCHLICHTING, R. L. 1991. Consul: A communication substrate for fault-tolerant distributed programs. TR 91-32.
 
44
 
45
MOSER,L.E.,AMIR, Y., MELLIAR-SMITH,P.M.,AND AGARWAL, D. A. 1994. Extended virtual synchrony. In Proceedings of the 14th IEEE International Conference on Distributed Computing Systems (ICDCS '94, Poznan, Poland, June). IEEE Computer Society Press, Los Alamitos, CA, 56-65.
46
47
 
48
RICCIARDI, A. 1992. The group membership problem in asynchronous systems. TR92-1313.
 
49
50
 
51
52
 
53
VITENBERG, R., KEIDAR, I., CHOCKLER,G.V.,AND DOLEV, D. 1999. Group communication specifications: A comprehensive study. Tech. Rep. MIT-LCS-TR-790. MIT Laboratory for Computer Science, Cambridge, MA. http://theory.lcs.mit.edu/idish/ftp/gcs-survey-tr.ps.
 
54

CITED BY  15


REVIEW

"Ashoke Deb : Reviewer"

A group communication service is a building block for development of practical distributed systems where processes located at different nodes of the system operate collectively as a group using a communication service to multicast messages to all   more...

Collaborative Colleagues:
Alan Fekete: colleagues
Nancy Lynch: colleagues
Alex Shvartsman: colleagues