ACM Home Page
Please provide us with feedback. Feedback
Fastpath Optimizations for Cluster Recovery in Shared-Disk Systems
Full text PdfPdf (177 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2004 ACM/IEEE conference on Supercomputing table of contents
Page: 5  
Year of Publication: 2004
ISBN:0-7695-2153-3
Author
Randal Burns  Johns Hopkins University
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 14,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.1109/SC.2004.25

ABSTRACT

We describe the design and implementation of a clustering service for a high-performance, shared-disk file system. The service provides failure detection and recovery, reliableend-to-end messaging, and a centralized and recoverable management interface. We implement novel optimizations in the voting protocol that resolves cluster membership. Optimizations allow clusters to form as quickly as possible without introducing livelock or requiring timeout parameters to be tuned carefully. Our treatment includes performance results that quantify the scalability of the system and measure recovery times.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
[2] Y. Amir and J. Stanton. The spread wide area group communication system. Technical Report CNDS-98-4, Center for Network and Distributed Systems, Johns Hopkins University, 1998.
 
3
 
4
 
5
[5] Zoning implementation strategies for brocade SAN fabrics. Brocade Inc., White Paper, 2002.
 
6
 
7
[7] R. Burns, R. M. Rees, and D. D. E. Long. Safe caching in a distributed file system. In Proceedings of the International Parallel and Distributed Processing Symposium, 2000.
 
8
[8] D. Naor et al. Object store security document. Storage Networking Industry Association (SNIA), 2003.
9
10
 
11
 
12
 
13
14
 
15
[15] RS/6000 SP high availability infrastructure. IBM Redbook SG224-4838, IBM, 1996.
 
16
 
17
[17] F. Jahanian, R. Rajkumar, and S. Fakhouri. Processor group membership protocols: Specification, design, and implementation. In Proceedings of the IEEE Symposium on Reliable Distributed Systems, 1993.
18
19
20
 
21
 
22
23
 
24
[24] C. Malloth and K. Schiper. View synchronous communication in large scale networks. In Workshop of the ESPRIT project BROADCAST, number 6360, 1995.
 
25
26
 
27
[27] L. E. Moser, Y. Amir, P. M. Melliar-Smith, and D. A. Agarwal. Extended virtual synchrony. In The IEEE International Conference on Distributed Computing Systems (ICDCS), 1994.
 
28
 
29
[29] J. Palmer, R. Strong, and E. Upfal. Nonblocking ordered reliable multicast in an unreliable distributed environment. Technical Report RJ-10096 (91913), IBM Research Division, 1997.
 
30
31
 
32
[32] L. Rodrigues and P. Verissimo. xAMp: A protocol suite for group communication. Technical Report RT/43-92, INSEC, 1992.
 
33
 
34
[34] D. Skeen. A quorum-based commit protocol. In Workshop of Distributed Data Management and Computer Networks, 1982.
35