ACM Home Page
Please provide us with feedback. Feedback
Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services
Full text PdfPdf (306 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2003 ACM/IEEE conference on Supercomputing table of contents
Page: 27  
Year of Publication: 2003
ISBN:1-58113-695-1
Authors
Kiran Nagaraja  Rutgers University
Neeraj Krishnan  Rutgers University
Ricardo Bianchini  Rutgers University
Richard P. Martin  Rutgers University
Thu D. Nguyen  Rutgers University
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 39,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Cluster-based servers can substantially increase performance when nodes cooperate to globally manage resources. However, in this paper we show that cooperation results in a substantial availability loss, in the absence of high-availability mechanisms. Specifically, we show that a sophisticated cluster-based Web server, which gains a factor of 3 in performance through cooperation, increases service unavailability by a factor of 10 over a non-cooperative version. We then show how to augment this Web server with software components embodying a small set of high-availability techniques to regain the lost availability. Among other interesting observations, we show that the application of multiple high-availability techniques, each implemented independently in its own subsystem, can lead to inconsistent recovery actions. We also show that a novel technique called Fault Model Enforcement can be used to resolve such inconsistencies. Augmenting the server with these techniques led to a final expected availability of close to 99.99%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
5
 
6
 
7
 
8
[8] Cisco CSS 11500 Series Content Services Switches, Apr. 2003. Available at http://www.cisco.com/en/US/products/hw/contnetw/ps792/ index.html.
 
9
 
10
[10] F. Cristian and F. Schmuck. Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems. 1995.
 
11
 
12
[12] J. Gray. A Census of Tandem System Availability Between 1985 and 1990. IEEE Transactions on Reliability, 39(4):409- 418, Oct. 1990.
 
13
14
 
15
 
16
 
17
[17] I. Lee and R. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN90 Operating System. In Proceedings of International Symposium on Fault-Tolerant Computing (FTCS-23), pages 20-29, 1993.
 
18
 
19
[19] X. Li, R. P. Martin, K. Nagaraja, T. D. Nguyen, and B. Zhang. Mendosus: A SAN-Based Fault-Injection Test-Bed for the Construction of Highly Available Network Services. In Proceedings of the 1st Workshop on Novel Uses of System Area Networks (SAN-1), Cambridge, MA, Jan. 2002.
 
20
[20] Linux virtual server project. http://www.linuxvirtualserver.org/.
 
21
[21] D. D. E. Long, J. L. Carroll, and C. J. Park. A Study of the Reliability of Internet Sites. In Proceedings of the Tenth Symposium on Reliable Distributed Systems, pages 177-186, Sept. 1991.
 
22
[22] B. Murphy and B. Levidow. Windows 2000 Dependability. (MSR-TR-2000-56), June 2000.
 
23
[23] K. Nagaraja, R. Bianchini, R. Martin, and T. D. Nguyen. Using Fault Model Enforcement to Improve Availability. In Proceedings of the Second Workshop on Evaluating and Architecting System dependabilitY (EASY), Oct. 2002.
 
24
25
26
27
 
28
[28] Service Monitoring Daemon, Apr. 2003. Available at http://www.kernel.org/software/mon/.
 
29
[29] M. Sullivan and R. Chillarege. Software Defects and their Impact on System Availability - A Study of Field Failures in Operating Systems. In Proceedings of the 21st International Symposium on Fault-Tolerant Computing (FTCS-21), pages 2-9, Montreal, Canada, 1991.
 
30
31
Collaborative Colleagues:
Kiran Nagaraja: colleagues
Neeraj Krishnan: colleagues
Ricardo Bianchini: colleagues
Richard P. Martin: colleagues
Thu D. Nguyen: colleagues