ACM Home Page
Please provide us with feedback. Feedback
BASE: using abstraction to improve fault tolerance
Full text PdfPdf (1.47 MB)
Source ACM Symposium on Operating Systems Principles archive
Proceedings of the eighteenth ACM symposium on Operating systems principles table of contents
Banff, Alberta, Canada
SESSION: Trust and dependability table of contents
Pages: 15 - 28  
Year of Publication: 2001
ISBN:1-58113-389-8
Also published in ...
Authors
Rodrigo Rodrigues  MIT Laboratory for Computer Science, Cambridge, MA
Miguel Castro  Microsoft Research Ltd., Cambridge, UK
Barbara Liskov  MIT Laboratory for Computer Science, Cambridge, MA
Sponsor
SIGOPS: ACM Special Interest Group on Operating Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 55,   Citation Count: 20
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502034.502037
What is a DOI?

ABSTRACT

Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication technique, BASE, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BASE reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or non-deterministic service implementations, which reduces the probability of common mode failures. We built an NFS service where each replica can run a different off-the-shelf file system implementation, and an object-oriented database where the replicas ran the same, non-deterministic implementation. These examples suggest that our technique can be used in practice --- in both cases, the implementation required only a modest amount of new code, and our performance results indicate that the replicated services perform comparably to the implementations that they reuse.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
M. Castro. Practical Byzantine Fault-Tolerance. PhD thesis, Massachusetts Institute of Technology, 2000.
6
 
7
 
8
M. Castro and B. Liskov. Proactive recovery in a Byzantine-fault-tolerant system. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation, San Diego, CA, Oct. 2000.
 
9
L. Chen and A. Avizienis. N-Version Programming: A Fanlt-Tolerance Approach to Reliability of Software Operation. In Fault Tolerant Computing, FTCS-8, pages 3-9, 1978.
10
 
11
 
12
 
13
 
14
15
 
16
17
 
18
19
 
20
 
21
S. Maffeis. Adding group communication and fault tolerance to CORBA. In Proceedings of the Pad USENIX Conference on Object-Oriented Technologies, pages 135-146, June 1995.
 
22
K. Marzullo and F. Schmuck. Supplying high availability with a standard network file system. In Proceedings of the 8th International Conference on Distributed Computing Systems, pages 447-453. IEEE, June 1988.
 
23
R. Minnich. The Linux BIOS Home Page. http://www.acl.lanl.gov/linuxbios, 2000.
 
24
 
25
 
26
Network working group request for comments: 1014. XDR: External data representation standard, June 1987.
 
27
Network working group request for comments: 1094. NFS: Network file system protocol specification, March 1989.
 
28
Object Management Group. The Common Object Request Broker: Architecture and Specification. OMG techical committee document formal/98-12-01, June 1999.
 
29
Object Management Group. Fault Tolerant CORBA. OMG techical committee document orbos/2000-04-04, Mar. 2000.
 
30
J. Ousterhout. Why Aren't Operating Systems Getting Faster as Fast as Hardware? In Proceedings of USENIX Summer Conference, pages 247-256, Anaheim, CA, June 1990.
31
 
32
R. Rodrigues. Combining abstraction with Byzantine fault-tolerance. Master's thesis, Massachusetts Institute of Technology, 2001.
 
33
A. Romanovsky. Faulty version recovery in object-oriented N-version programming. IEE Proceedings - Software, 147(3):81-90, June 2000.
 
34
35
 
36
K. Tso and A. Avizienis. Community error recovery in N-version software: A design study with experimentation. In Proceedings of the 17th Annual International Symposium on Fault-Tolerant Computing, pages 127-133, Pittsburgh, PA, July 1987.

CITED BY  20

Collaborative Colleagues:
Rodrigo Rodrigues: colleagues
Miguel Castro: colleagues
Barbara Liskov: colleagues