|
ABSTRACT
This article describes an architecture that allows a replicated service to survive crashes without breaking its TCP connections. Our approach does not require modifications to the TCP protocol, to the operating system on the server, or to any of the software running on the clients. Furthermore, it runs on commodity hardware. We compare two implementations of this architecture (one based on primary/backup replication and another based on message logging) focusing on scalability, failover time, and application transparency. We evaluate three types of services: a file server, a Web server, and a multimedia streaming server. Our experiments suggest that the approach incurs low overhead on throughput, scales well as the number of clients increases, and allows recovery of the service in near-optimal time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Aghdaie, N. and Tamir, Y. 2002. Implementation and evaluation of transparent fault-tolerant web service with kernel-level support. In Proceedings of the 11th IEEE International Conference on Computer Communications and Networks (ICCCN), 63--68.
|
| |
2
|
Aghdaie, N. and Tamir, Y. 2003. Fast transparent failover for reliable web service. In Proceedings of the 15th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS).
|
| |
3
|
Alvisi, L., Bressoud, T. C., El-Khashab, A., Marzullo, K., and Zagorodnov, D. 2001. Wrapping server-side TCP to mask connection failures. In Proceedings of the IEEE InfoCom Conference, 329--337.
|
| |
4
|
Apache. 2005. Apache homepage. http://www.apache.org/.
|
| |
5
|
Basile, C., Kalbarczyk, Z., and K., I. R. 2003. A preemptive deterministic scheduling algorithm for multithreaded replicas. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), 149--158.
|
| |
6
|
Basile, C., Kalbarczyk, Z., Whisnant, K., and Iyer, R. K. 2002. Active replication of multithreaded applications. Tech. rep. CRHC-02-01, University of Illinois.
|
| |
7
|
Bhide, A., Elnozahy, E., and Morgan, S. 1991. A highly available network file server. In Proceedings of the USENIX Winter Technical Conference, 199--205.
|
 |
8
|
Robert Bradford , Evangelos Kotsovinos , Anja Feldmann , Harald Schiöberg, Live wide-area migration of virtual machines including local persistent state, Proceedings of the 3rd international conference on Virtual execution environments, June 13-15, 2007, San Diego, California, USA
[doi> 10.1145/1254810.1254834]
|
| |
9
|
|
 |
10
|
|
| |
11
|
Budhiraja, N., Marzullo, K., Schneider, F., and Toueg, S. 1992. Primary-Backup protocols: Lower bounds and optimal implementations. In Proceedings of the 3rd IFIP Conference on Dependable Computing for Critical Applications, 187--198.
|
| |
12
|
|
| |
13
|
Christopher Clark , Keir Fraser , Steven Hand , Jacob Gorm Hansen , Eric Jul , Christian Limpach , Ian Pratt , Andrew Warfield, Live migration of virtual machines, Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation, p.273-286, May 02-04, 2005
|
| |
14
|
Brendan Cully , Geoffrey Lefebvre , Dutch Meyer , Mike Feeley , Norm Hutchinson , Andrew Warfield, Remus: high availability via asynchronous virtual machine replication, Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, p.161-174, April 16-18, 2008, San Francisco, California
|
| |
15
|
Daniel, E. and Choi, G. S. 1999. TMR for off-the-shelf Unix systems. Short presentation at IEEE International Symposium on Fault-Tolerant Computing (FTCS).
|
| |
16
|
Dolev, D., Malki, D., and Yarom, Y. 1994. Warm backup using snooping. In Proceedings of the 1st International Workshop on Services in Distributed and Networked Environments (SDNE), 60--65.
|
| |
17
|
DSS. 2005. Homepage. http://developer.apple.com/darwin/projects/streaming/.
|
| |
18
|
|
 |
19
|
|
| |
20
|
Fetzer, C. and Mishra, S. 1999. Transparent TCP/IP based replication. Short presentation at IEEE International Symposium on Fault-Tolerant Computing (FTCS).
|
| |
21
|
|
 |
22
|
|
| |
23
|
Koch, R. R., Hortikar, S., E., M. L., and M., M.-S. P. 2003. Transparent TCP connection failover. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 383--392.
|
| |
24
|
Luo, M. and Yang, C. 2001. Constructing zero-loss web services. In Proceedings of the IEEE InfoCom, 1781--1790.
|
| |
25
|
Marwah, M., Mishra, S., and Fetzer, C. 2003. TCP server fault tolerance using connection migration to a backup server. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 373--382.
|
| |
26
|
|
| |
27
|
|
| |
28
|
Napper, J., Alvisi, L., and Vin, H. 2003. A fault-tolerant Java virtual machine. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 425--434.
|
| |
29
|
Nasika, R. and Dasgupta, P. 2000. Transparent migration of distributed communicating processes. In Proceedings of the 13th ISCA International Conference on Parallel and Distributed Computing Systems (PDCS).
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
|
 |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
|
| |
40
|
|
| |
41
|
SMB. 2005. Samba homepage. http://www.samba.org/.
|
| |
42
|
|
| |
43
|
Srinivasan, K. 2001. M-TCP: Transport layer support for highly available network services. M.S. thesis, Rutgers University. Available as Tech. Rep. DCS-TR-459.
|
| |
44
|
|
| |
45
|
|
| |
46
|
Florin Sultan , Aniruddha Bohra , Stephen Smaldone , Yufei Pan , Pascal Gallard , Iulian Neamtiu , Liviu Iftode, Recovering Internet Service Sessions from Operating System Failures, IEEE Internet Computing, v.9 n.2, p.17-27, March 2005
[doi> 10.1109/MIC.2005.45]
|
| |
47
|
Sultan, F., Bohra, A., and Iftode, L. 2003. Service continuations: An operating system mechanism for dynamic migration of Internet service sessions. In Proceedings of the Symposium Reliable Distributed Systems (SRDS), 177--186.
|
| |
48
|
|
| |
49
|
Sultan, F., Srinivasan, K., and Iftode, L. 2001. Transport layer support for highly-available network services. Tech. rep. DCS-TR-429, Rutgers University, May.
|
| |
50
|
X/Open. 1992. Protocols for X/Open PC Interworking: SMB, Version 2. X/Open Company Ltd. Also available at http://www.opengroup.org/products/publications/catalog/c209.htm.
|
| |
51
|
|
| |
52
|
Zagorodnov, D. and Marzullo, K. 2005. Managing self-inflicted nondeterminism. In Proceedings of the 1st Workshop on Hot Topics in System Dependability (HotDep), 323--328.
|
| |
53
|
Zagorodnov, D., Marzullo, K., Alvisi, L., and Bressoud, T. 2003. Engineering fault-tolerant TCP/IP servers using FT-TCP. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), 393--402.
|
 |
54
|
|
| |
55
|
Zhang, R., Abdelzaher, T. F., and Stankovic, J. A. 2004. Efficient TCP connection failover in web server clusters. In Proceedings of the IEEE InfoCom Conference. Vol. 2, 1219--1228.
|
|