ACM Home Page
Please provide us with feedback. Feedback
Practical and low-overhead masking of failures of TCP-based servers
Full text PdfPdf (401 KB)
Source
ACM Transactions on Computer Systems (TOCS) archive
Volume 27 ,  Issue 2  (May 2009) table of contents
Article No. 4  
Year of Publication: 2009
ISSN:0734-2071
Authors
Dmitrii Zagorodnov  University of California, Santa Barbara, Santa Barbara, CA
Keith Marzullo  University of California, San Diego, La Jolla, CA
Lorenzo Alvisi  The University of Texas at Austin, Austin, TX
Thomas C. Bressoud  Denison University, Granville, OH
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 55,   Downloads (12 Months): 293,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1534909.1534911
What is a DOI?

ABSTRACT

This article describes an architecture that allows a replicated service to survive crashes without breaking its TCP connections. Our approach does not require modifications to the TCP protocol, to the operating system on the server, or to any of the software running on the clients. Furthermore, it runs on commodity hardware. We compare two implementations of this architecture (one based on primary/backup replication and another based on message logging) focusing on scalability, failover time, and application transparency. We evaluate three types of services: a file server, a Web server, and a multimedia streaming server. Our experiments suggest that the approach incurs low overhead on throughput, scales well as the number of clients increases, and allows recovery of the service in near-optimal time.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Aghdaie, N. and Tamir, Y. 2002. Implementation and evaluation of transparent fault-tolerant web service with kernel-level support. In Proceedings of the 11th IEEE International Conference on Computer Communications and Networks (ICCCN), 63--68.
 
2
Aghdaie, N. and Tamir, Y. 2003. Fast transparent failover for reliable web service. In Proceedings of the 15th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS).
 
3
Alvisi, L., Bressoud, T. C., El-Khashab, A., Marzullo, K., and Zagorodnov, D. 2001. Wrapping server-side TCP to mask connection failures. In Proceedings of the IEEE InfoCom Conference, 329--337.
 
4
Apache. 2005. Apache homepage. http://www.apache.org/.
 
5
Basile, C., Kalbarczyk, Z., and K., I. R. 2003. A preemptive deterministic scheduling algorithm for multithreaded replicas. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), 149--158.
 
6
Basile, C., Kalbarczyk, Z., Whisnant, K., and Iyer, R. K. 2002. Active replication of multithreaded applications. Tech. rep. CRHC-02-01, University of Illinois.
 
7
Bhide, A., Elnozahy, E., and Morgan, S. 1991. A highly available network file server. In Proceedings of the USENIX Winter Technical Conference, 199--205.
8
 
9
10
 
11
Budhiraja, N., Marzullo, K., Schneider, F., and Toueg, S. 1992. Primary-Backup protocols: Lower bounds and optimal implementations. In Proceedings of the 3rd IFIP Conference on Dependable Computing for Critical Applications, 187--198.
 
12
 
13
 
14
 
15
Daniel, E. and Choi, G. S. 1999. TMR for off-the-shelf Unix systems. Short presentation at IEEE International Symposium on Fault-Tolerant Computing (FTCS).
 
16
Dolev, D., Malki, D., and Yarom, Y. 1994. Warm backup using snooping. In Proceedings of the 1st International Workshop on Services in Distributed and Networked Environments (SDNE), 60--65.
 
17
DSS. 2005. Homepage. http://developer.apple.com/darwin/projects/streaming/.
 
18
19
 
20
Fetzer, C. and Mishra, S. 1999. Transparent TCP/IP based replication. Short presentation at IEEE International Symposium on Fault-Tolerant Computing (FTCS).
 
21
22
 
23
Koch, R. R., Hortikar, S., E., M. L., and M., M.-S. P. 2003. Transparent TCP connection failover. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 383--392.
 
24
Luo, M. and Yang, C. 2001. Constructing zero-loss web services. In Proceedings of the IEEE InfoCom, 1781--1790.
 
25
Marwah, M., Mishra, S., and Fetzer, C. 2003. TCP server fault tolerance using connection migration to a backup server. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 373--382.
 
26
 
27
 
28
Napper, J., Alvisi, L., and Vin, H. 2003. A fault-tolerant Java virtual machine. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 425--434.
 
29
Nasika, R. and Dasgupta, P. 2000. Transparent migration of distributed communicating processes. In Proceedings of the 13th ISCA International Conference on Parallel and Distributed Computing Systems (PDCS).
 
30
 
31
 
32
33
 
34
 
35
36
 
37
 
38
 
39
 
40
 
41
SMB. 2005. Samba homepage. http://www.samba.org/.
 
42
 
43
Srinivasan, K. 2001. M-TCP: Transport layer support for highly available network services. M.S. thesis, Rutgers University. Available as Tech. Rep. DCS-TR-459.
 
44
 
45
 
46
 
47
Sultan, F., Bohra, A., and Iftode, L. 2003. Service continuations: An operating system mechanism for dynamic migration of Internet service sessions. In Proceedings of the Symposium Reliable Distributed Systems (SRDS), 177--186.
 
48
 
49
Sultan, F., Srinivasan, K., and Iftode, L. 2001. Transport layer support for highly-available network services. Tech. rep. DCS-TR-429, Rutgers University, May.
 
50
X/Open. 1992. Protocols for X/Open PC Interworking: SMB, Version 2. X/Open Company Ltd. Also available at http://www.opengroup.org/products/publications/catalog/c209.htm.
 
51
 
52
Zagorodnov, D. and Marzullo, K. 2005. Managing self-inflicted nondeterminism. In Proceedings of the 1st Workshop on Hot Topics in System Dependability (HotDep), 323--328.
 
53
Zagorodnov, D., Marzullo, K., Alvisi, L., and Bressoud, T. 2003. Engineering fault-tolerant TCP/IP servers using FT-TCP. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 393--402.
54
 
55
Zhang, R., Abdelzaher, T. F., and Stankovic, J. A. 2004. Efficient TCP connection failover in web server clusters. In Proceedings of the IEEE InfoCom Conference. Vol. 2, 1219--1228.

Collaborative Colleagues:
Dmitrii Zagorodnov: colleagues
Keith Marzullo: colleagues
Lorenzo Alvisi: colleagues
Thomas C. Bressoud: colleagues