ACM Home Page
Please provide us with feedback. Feedback
BASE: Using abstraction to improve fault tolerance
Full text PdfPdf (438 KB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 21 ,  Issue 3  (August 2003) table of contents
Pages: 236 - 269  
Year of Publication: 2003
ISSN:0734-2071
Authors
Miguel Castro  Microsoft Research, Cambridge, UK
Rodrigo Rodrigues  MIT Laboratory for Computer Science, Cambridge, MA
Barbara Liskov  MIT Laboratory for Computer Science, Cambridge, MA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 101,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/859716.859718
What is a DOI?

ABSTRACT

Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or nondeterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a different off-the-shelf file system implementation, and an object-oriented database where the replicas ran the same, nondeterministic implementation. These examples suggest that our technique can be used in practice---in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Amir, Y., Danilov, C., Miskin-Amir, M., Stanton, J., and Tutu, C. 2002. Practical Wide-Area Database Replication. Tech. Rep. CNDS-2002-1, Johns Hopkins University.
3
4
 
5
6
 
7
Castro, M. 2000. Practical Byzantine fault-tolerance. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts.
8
 
9
 
10
Castro, M. and Liskov, B. 2000. Proactive recovery in a Byzantine-fault-tolerant system. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation. San Diego, California, 273--288.
11
 
12
Chen, L. and Avizienis, A. 1978. N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation. In Fault Tolerant Computing, FTCS-8. 3--9.
13
14
 
15
 
16
 
17
 
18
 
19
20
21
 
22
 
23
 
24
25
 
26
27
 
28
 
29
Maffeis, S. 1995. Adding group communication and fault tolerance to CORBA. In Proceedings of the Second USENIX Conference on Object-Oriented Technologies. Toronto, Canada, 135--146.
 
30
Marzullo, K. and Schmuck, F. 1988. Supplying high availability with a standard network file system. In Proceedings of the 8th International Conference on Distributed Computing Systems. San Jose, California, 447--453.
 
31
Mills, D. L. 1992. Network Time Protocol (Version 3) Specification, Implementation and Analysis. Network Working Report RFC 1305.
 
32
Minnich, R. 2000. The Linux BIOS Home Page. http://www.acl.lanl.gov/linuxbios.
 
33
 
34
 
35
Object Management Group. 1999. The Common Object Request Broker: Architecture and Specification. Omg techical committee document formal/98-12-01. June.
 
36
Object Management Group. 2000. Fault Tolerant CORBA. Omg techical committee document orbos/2000-04-04. Mar.
 
37
Ousterhout, J. 1990. Why Aren't Operating Systems Getting Faster as Fast as Hardware? In Proceedings of the Usenix Summer 1990 Technical Conference. Anaheim, California, 247--256.
38
 
39
 
40
RFC-1014 1987. Network working group request for comments: 1014. XDR: External data representation standard.
 
41
RFC-1094 1989. Network working group request for comments: 1094. NFS: Network file system protocol specification.
 
42
Rodrigues, R. 2001. Combining abstraction with Byzantine fault-tolerance. M.S. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts.
 
43
Romanovsky, A. 2000. Faulty version recovery in object-oriented N-version programming. IEE Proc. Soft. 147, 3 (June), 81--90.
 
44
45
 
46
 
47
Tso, K. and Avizienis, A. 1987. Community error recovery in N-version software: A design study with experimentation. In Digest of Papers: FTCS-17, the Seventeenth Annual Symposium on Fault Tolerant Computing. Pittsburgh, Pennsylvania, 127--133.
 
48
W3C. 2000. Extensible Markup Language (XML) 1.0 (Second Edition). W3C recommendation.

CITED BY  10

Collaborative Colleagues:
Miguel Castro: colleagues
Rodrigo Rodrigues: colleagues
Barbara Liskov: colleagues