ACM Home Page
Please provide us with feedback. Feedback
Upright cluster services
Full text PdfPdf (551 KB)
Source
ACM Symposium on Operating Systems Principles archive
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles table of contents
Big Sky, Montana, USA
SESSION: Clusters table of contents
Pages 277-290  
Year of Publication: 2009
ISBN:978-1-60558-752-3
Authors
Allen Clement  The University of Texas at Austin, Austin, TX, USA
Manos Kapritsos  The University of Texas at Austin, Austin, TX, USA
Sangmin Lee  The University of Texas at Austin, Austin, TX, USA
Yang Wang  The University of Texas at Austin, Austin, TX, USA
Lorenzo Alvisi  The University of Texas at Austin, Austin, TX, USA
Mike Dahlin  The University of Texas at Austin, Austin, TX, USA
Taylor Riche  The University of Texas at Austin, Austin, TX, USA
Sponsors
ACM: Association for Computing Machinery
SIGOPS: ACM Special Interest Group on Operating Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 32,   Downloads (12 Months): 32,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629575.1629602
What is a DOI?

ABSTRACT

The UpRight library seeks to make Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight by producing BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Our design choices in UpRight favor simplifying adoption by existing applications; performance is a secondary concern. Despite these priorities, our BFT Zookeeper and BFT HDFS implementations have performance comparable with the originals while providing additional robustness.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Abd-El-Malek, G.R. Ganger, G.R. Goodson, M.K. Reiter, and J.J. Wylie. Fault-scalable byzantine fault-tolerant services. In SOSP, 2005.
 
2
T. Abdollah. LAX outage is blamed on 1 computer. Los Angeles Times, Aug. 2007.
 
3
A.S. Aiyer, L. Alvisi, R.A. Bazzi, and A. Clement. Matrix signatures: From macs to digital signatures in distributed systems. In DISC, 2008.
 
4
Amazon elastic compute cloud. http://aws.amazon.com/ec2/, Mar. 2009.
 
5
Y. Amir, B.A. Coan, J. Kirsch, and J. Lane. Byzantine replication under attack. In DSN, 2008.
 
6
M. Burrows. The chubby lock service for loosely-coupled distributed systems. In OSDI, 2006.
 
7
M. Calore. Ma.gnolia suffers major data loss, site taken offline. Wired, Jan. 2009.
 
8
M. Castro and B. Liskov. Practical byzantine fault tolerance. In OSDI, 1999.
 
9
M. Castro and B. Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst., 20(4), 2002.
 
10
T.D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. In PODC, 2007.
 
11
P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A. Patterson. Raid: high-performance, reliable secondary storage. ACM Comput. Surv., 26(2), 1994.
 
12
A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making byzantine fault tolerant systems tolerate byzantine faults. In NSDI, 2009.
 
13
J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. Hq replication: a hybrid quorum protocol for byzantine fault tolerance. In OSDI, 2006.
 
14
P. Dutta, R. Guerraoui, and M. Vukolić. Best-case complexity of asynchronous byzantine consensus. Technical Report EPFL/IC/200499, École Polytechnique Fédérale de Lausanne, 2005.
 
15
The FlexiProvider Group. the FlexiProvider Project. http://www.flexiprovider.de.
 
16
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP, 2003.
 
17
Hadoop. http://hadoop.apache.org/core/.
 
18
C.E. Killian, J.W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In NSDI, 2007.
 
19
R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: speculative byzantine fault tolerance. In SOSP, 2007.
 
20
R. Kotla and M. Dahlin. High throughput byzantine fault tolerance. In DSN, 2004.
 
21
L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2), 1998.
 
22
L. Lamport. Lower bounds for asynchronous consensus. In FuDiCo, June 2003.
 
23
L. Lamport, R. Shostak, and M. Pease. The byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3), 1982.
 
24
D.E. Lowell and P.M. Chen. Free transactions with rio vista. In SOSP, 1997.
 
25
B.M. Oki and B.H. Liskov. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In PODC, 1988.
 
26
E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. In FAST, 2007.
 
27
V. Prabhakaran, L.N. Bairavasundaram, N. Agrawal, H.S. Gunawi, A.C. Arpaci-Dusseau, and R.H. Arpaci-Dusseau. Iron file systems. In SOSP, 2005.
 
28
A. Rich. ZFS, sun's cutting-edge file system. Technical report, Sun Microsystems, 2006.
 
29
F.B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv., 22(4), 1990.
 
30
B. Schroeder and G.A. Gibson. Disk failures in the real world: what does an mttf of 1,000,000 hours mean to you? In FAST, 2007.
 
31
A.S. Team. Amazon S3 availability event: July 20, 2008. http://status.aws.amazon.com/s3-20080720.html.
 
32
C.A. Thekkath, T. Mann, and E.K. Lee. Frangipani: a scalable distributed file system. In SOSP, 1997.
 
33
B. Vandiver, H. Balakrishnan, B. Liskov, and S. Madden. Tolerating byzantine faults in transaction processing systems using commit barrier scheduling. In SOSP, 2007.
 
34
T. Wood, R. Singh, A. Venkataramani, and P. Shenoy. ZZ: Cheap practical BFT using virtualization. Technical Report TR14-08, University of Massachusetts, 2008.
 
35
J. Yin, J.-P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating agreement from execution for byzantine fault tolerant services. In SOSP, 2003.
 
36
Zookeeper. http://hadoop.apache.org/zookeeper.