ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Fault tolerance under UNIX
Full text PdfPdf (1.97 MB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 7 ,  Issue 1  (February 1989) table of contents
Pages: 1 - 24  
Year of Publication: 1989
ISSN:0734-2071
Authors
Anita Borg  Digital Equipment Corp., Palo Alto, CA
Wolfgang Blau  Tandem Computers GmbH, Frankfurt, W. Germany
Wolfgang Graetsch  Nixdorf Computer GmbH, Paderborn, W. Germany
Ferdinand Herrmann  Nixdorf Computer GmbH, Paderborn, W. Germany
Wolfgang Oberle  Nixdorf Computer GmbH, Paderborn, W. Germany
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 66,   Citation Count: 49
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/58564.58565
What is a DOI?

ABSTRACT

The initial design for a distributed, fault-tolerant version of UNIX based on three-way atomic message transmission was presented in an earlier paper [3]. The implementation effort then moved from Auragen Systems1 to Nixdorf Computer where it was completed. This paper describes the working system, now known as the TARGON/32. The original design left open questions in at least two areas: fault tolerance for server processes and recovery after a crash were briefly and inaccurately sketched, rebackup after recovery was not discussed at all. The fundamental design involving three-way message transmission has remained unchanged. However, in addition to important changes in the implementation, server backup has been redesigned and is now more consistent with that of normal user processes. Recovery and rebackup have been completed in a less centralized and thus more efficient manner than previously envisioned. In this paper we review important aspects of the original design and note how the implementation differs from our original ideas. We then focus on the backup and recovery for server processes and the changes and additions in the design and implementation of recovery and rebackup.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ARNOW, D., AND GLAZER, S. A fast safe file system for UNIX. Unpublished paper written in 1984 for Auragen Systems Corp., Ft. Lee, N.J.
2
3
4
5
6
 
7
LISKOV, B., AND LADIN, R. Highly-available distributed services and fault-tolerant distributed garbage collection. Programming Methodology Group Memo 48, MIT Laboratory for Computer Science, May, 1986.
8
9
 
10
RASHID, R., AND ROBERTSON, G. Accent: A communication-oriented network operating system kernel. Tech. Rep. CMU-CS-81-123, Dept. of Computer Science, Carnegie-Mellon Univ., Apr. 1981.
11
 
12
Stratus~32, VOS Re{erence Manual. Stratus Computers, Inc., Marlborough, Mass., 1982.
13
 
14
TOLERANT SYSTEMS, INC. Eternity series: Technology brief. Internal publication, July 1988, Tolerant Systems, San Jose, Calif.
15
 
16
WALTER, B. A robust and efficient protocol for checking the availability of remote sites. In Proceedings of the Sixth Workshop on Distributed Data Management and Computer Networks, (Berkeley, Calif., Feb. 1982), pp. 45-68.

CITED BY  49


REVIEW

"Paul Siegel : Reviewer"

After many years of relatively quiet use in Bell Labs, universities, and a few commercial development centers, the UNIX operating system has recently become more popular. Its portability is a major asset: applications developed on a machine run  more...

Collaborative Colleagues:
Anita Borg: colleagues
Wolfgang Blau: colleagues
Wolfgang Graetsch: colleagues
Ferdinand Herrmann: colleagues
Wolfgang Oberle: colleagues