ACM Home Page
Please provide us with feedback. Feedback
Practical byzantine fault tolerance and proactive recovery
Full text PdfPdf (1.63 MB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 20 ,  Issue 4  (November 2002) table of contents
Pages: 398 - 461  
Year of Publication: 2002
ISSN:0734-2071
Authors
Miguel Castro  Microsoft Research
Barbara Liskov  MIT Laboratory for Computer Science
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 374,   Citation Count: 40
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/571637.571640
What is a DOI?

ABSTRACT

Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2% faster to 24% slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
Bellare, M. and Micciancio, D. 1997. A new paradigm for collision-free hashing: Incrementality at reduced cost. In Advances in Cryptology---EUROCRYPT' 97, Lecture Notes in Computer Science, vol. 1233, W. Fumy, Ed., Springer-Verlag, Konstanz, Germany, 163--192.
 
5
Bellare, M. and Rogaway, P. 1995. Optimal asymmetric encryption---How to encrypt with RSA. In Advances in Cryptology---EUROCRYPT 94, Lecture Notes in Computer Science, vol. 950, A. D. Santis, Ed., Springer-Verlag, Perugia, Italy, 92--111.
 
6
Bellare, M. and Rogaway, P. 1996. The exact security of digital signatures. How to sign with RSA and Rabin. In Advances in Cryptology---EUROCRYPT 96, Lecture Notes in Computer Science, vol. 1070, U. Maurer, Ed., Springer-Verlag, Zaragoza, Spain, 399--416.
 
7
 
8
 
9
Blum, M., Evans, W., Gemmel, P., Kannan, S., and Naor, M. 1994. Checking the correctness of memories. Algorithmica 12, 225--244.
10
11
 
12
Canetti, R. and Rabin, T. 1992. Optimal asynchronous byzantine agreement. Tech. Rep. #92-15, Computer Science Department, Hebrew University.
 
13
Canetti, R., Halevi, S., and Herzberg, A. 1997. Maintaining authenticated communication in the presence of break-ins. In Proceedings of the Fourth ACM Conference on Computers and Communication Security, ACM Press, Zurich, Switzerland.
 
14
Castro, M. 2001. Practical Byzantine fault tolerance. Tech. Rep. MIT/LCS/TR-817, MIT Laboratory for Computer Science. January.
 
15
 
16
 
17
 
18
Cristian, F., Aghili, H., Strong, R., and Dolev, D. 1985. Atomic broadcast: From simple message diffusion to Byzantine agreement. In Proceedings of the Fifteenth International Conference on Fault Tolerant Computing, IEEE Computer Society Press, Ann Arbor, Mich.
19
 
20
 
21
22
 
23
Fu, K., Kaashoek, M. F., and Mazières, D. 2000. Fast and secure distributed read-only file system. In Proceedings of the Fourth USENIX Symposium on Operating Systems Design and Implementation (OSDI 2000), USENIX, San Diego.
 
24
 
25
26
27
 
28
Gray, J. 2000. FT 101. Talk at the University of California at Berkeley.
29
30
 
31
32
 
33
Katcher, J. 1997. PostMark: A new file system benehmark. Tech. Rep. TR-3022, Network Appliance. October.
34
 
35
 
36
 
37
Lamport, L. 1977. Proving the correctness of multiprocess programs. IEEE Trans. Softw. Eng. 3, 2 (Nov.), 125--143.
38
39
 
40
Lamport, L. 1989. The part-time parliament. Research Rep. 49, Digital Equipment Corporation Systems Research Center, Palo Alto, Sept.
41
42
 
43
Liskov, B. and Zilles, S. 1975. Specification techniques for data abstractions. IEEE Trans. Softw. Eng. SE-1, 1 (Mar.), 7--17.
44
 
45
 
46
Maheshwari, U., Vingralek, R., and Shapiro, B. 2000. How to build a trusted database system on untrusted storage. In Proceedings of the Fourth USENIX Symposium on Operating Systems Design and Implementation (OSDI 2000), USENIX, San Diego.
 
47
 
48
Malkhi, D. and Reiter, M. 1996b. Unreliable intrusion detection in distributed computations. In Proceedings of the Ninth Computer Security Foundations Workshop, IEEE Computer Society Press, Ireland, 9--17.
 
49
 
50
 
51
 
52
Malkhi, D., Reiter, M., and Lynch, N. 1998. A correctness condition for memory shared by Byzantine processes (Submitted).
53
 
54
 
55
Minnich, R. 2000. The Linux BIOS home page. Available at http://www.acl.lanl.gov/linuxbios.
 
56
Murphy, B. and Levidow, B. 2000. Windows 2000 dependability. In Proceedings of IEEE International Conference on Dependable Systems and Networks, IEEE Computer Society Press, New York.
57
58
 
59
Ousterhout, J. 1990. Why aren't operating systems getting faster as fast as hardware? In Proceedings of USENIX Summer Conference, USENIX, Anaheim, Calif., 247--256.
60
 
61
Postel, J. 1980. User datagram protocol. DARPA-Internet RFC-768.
62
 
63
 
64
 
65
Rivest, R. 1992. The MD5 message-digest algorithm. Internet RFC-1321.
66
 
67
Sandberg, R., Goldberg, D., Kleiman, S., Walsh, D., and Lyon, B. 1985. Design and implementation of the sun network filesystem. In Proceedings of the Summer 1985 USENIX Conference, USENIX, Portland, Oreo, 119--130.
68
69
 
70
Schneier, B. 1996. Applied Cryptography. Wiley, New York.
 
71
SHA1 1994. Announcement of Weakness in Secure Hash Standard.
 
72
Wensley, J., Lamport, L., Goldberg, J., Green, M., Levitt, K., Melliar-Smith, M., Shostak, R., and Weinstock, C. 1978. SIFT: Design and analysis of a fault-tolerant computer for aircraft control. Proc. IEEE 66, 10 (Oct.), 1240--1255.
 
73

CITED BY  40


REVIEW

"Cristiano di Flora : Reviewer"

The most challenging faults in the context of modern wide-area distributed applications are those caused by software bugs, intentional attacks, and operator mistakes. Such faults, called Byzantine faults, cause arbitrary behavior of the overall sy  more...

Collaborative Colleagues:
Miguel Castro: colleagues
Barbara Liskov: colleagues