ACM Home Page
Please provide us with feedback. Feedback
The Rio file cache: surviving operating system crashes
Full text PdfPdf (1.12 MB)
Source ACM SIGOPS Operating Systems Review archive
Volume 30 ,  Issue 5  (December 1996) table of contents
Pages: 74 - 83  
Year of Publication: 1996
ISSN:0163-5980
Also published in ...
Authors
Peter M. Chen  Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science, University of Michigan
Wee Teck Ng  Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science, University of Michigan
Subhachandra Chandra  Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science, University of Michigan
Christopher Aycock  Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science, University of Michigan
Gurushankar Rajamani  Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science, University of Michigan
David Lowell  Computer Science and Engineering Division, Department of Electrical Engineering and Computer Science, University of Michigan
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 226,   Citation Count: 44
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/248208.237154
What is a DOI?

ABSTRACT

One of the fundamental limits to high-performance, high-reliability file systems is memory's vulnerability to system crashes. Because memory is viewed as unsafe, systems periodically write data back to disk. The extra disk traffic lowers performance, and the delay period before data is safe lowers reliability. The goal of the Rio (RAM I/O) file cache is to make ordinary main memory safe for persistent storage by enabling memory to survive operating system crashes. Reliable memory enables a system to achieve the best of both worlds: reliability equivalent to a write-through file cache, where every write is instantly safe, and performance equivalent to a pure write-back cache, with no reliability-induced writes to disk. To achieve reliability, we protect memory during a crash and restore it during a reboot (a "warm" reboot). Extensive crash tests show that even without protection, warm reboot enables memory to achieve reliability close to that of a write-through file system. Adding protection makes memory even safer than a write-through file system while adding essentially no overhead. By eliminating reliability-induced disk writes, Rio performs 4-22 times as fast as a write-through file system, 2-14 times as fast as a standard Unix file system, and 1-3 times as fast as an optimized system that risks losing 30 seconds of data and metadata.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
Abbott94
M. Abbott, D. Har, L. Herger, M. Kauffmann, K. Mak, J. Murdock, C. Schulz, B. Smith, B. Tremaine, D. Yeh, and L. Wong. Durable Memory RS/6000 System Design. In Proceedings of the 1994 International Symposium on Fault-Tolerant Computing, pages 414-423, 1994.
 
APC96
The Power Protection Handbook. Technical report, American Power Conversion, 1996.
Baker91
Baker92a
 
Baker92b
Mary Baker and Mark Sullivan. The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment. In Proceedings USENIX Summer Conference, June 1992.
 
Baker94
Mary Louise Gray Baker. Fast Crash Recovery in Distributed File Systems. PhD thesis, University of California at Berkeley, January 1994.
 
Banatre91
Michel Banatre, Gilles Muller, Bruno Rochat, and Patrick Sanchez. Design decisions for the FTM: a general purpose fault tolerant machine. In Proceedings of the 1991 International Symposium on Fault-Tolerant Computing, pages 71-78, June 1991.
 
Barton90
Chapin95
 
Chen96
Peter M. Chen, Wee Teck Ng, Gurushankar Rajamani, and Christopher M. Aycock. The Rio File Cache: Surviving Operating System Crashes. Technical Report CSE-TR- 286-96, University of Michigan, March 1996.
 
Copeland89
 
DEC94
DEC 3000 300/400/500/600/700/800/900 AXP Models System Programmer's Manual. Technical report, Digital Equipment Corporation, July 1994.
DeWitt84
Gait90
 
Ganger94
Gregory R. Ganger and Yale N. Patt. Metadata Update Performance in File Systems. 1994 Operating Systems Design and Implementation (OSDI), November 1994.
 
Gray90
Jim Gray. A Census of Tandem System Availability between 1985 and 1990. IEEE Transactions on Reliability, 39(4), October 1990.
Hagmann87
 
Hartman93
John H. Hartman and John K. Ousterhout. Letter to the Editor. Operating Systems Review, 27(1):7-9, January 1993.
 
Hennessy90
Howard88
Johnson82
 
Kanawati95
 
Kane92
 
Kao93
Kessler90
 
Lee93
Inhwan Lee and RavishankarK. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In International Symposium on Fault-Tolerant Computing (FTCS), pages 20-29, 1993.
 
Leffler89
Samuel J. Leffier, Marshall Kirk McKusick, Michael J. Karels, and John S. Quarterman. The Design and Implementation of the 4.3BSD Unix Operating System. Addison-Wesley Publishing Company, 1989.
Liskov91
 
McKusick90
Marshall Kirk McKusick, Michael J. Karels, and Keith Bostic. A Pageable Memory Based Filesystem. In Proceedings US- ENIX Summer Conference, June 1990.
 
Moran90
J. Moran, Russel Sandberg, D. Coleman, J. Kepecs, and Bob Lyon. Breaking Through the NFS Performance Barrier. In Proceedings of EUUG Spring 1990, April 1990.
 
Ohta90
Masataka Ohta and Hiroshi Tezuka. A Fast /tmp File System by Delay Mount Option. In Proceedings USENIX Summer Conference, pages 145-150, June 1990.
Ousterhout85
Rosenblum92
 
Silberschatz94
 
Sites92
 
SPE91
SPEC SDM Release 1.0 Technical Fact Sheet. Technical report, Franson and Haggerty Associates, 1991.
 
Sullivan91a
 
Sullivan91b
Mark Sullivan and R. Chillarege. Software Defects and Their Impact on System Availability-A Study of Field Failures in Operating Systems. In Proceedings of the 1991 International Symposium on Fault-Tolerant Computing, June 1991.
 
Tanenbaum95
Wahbe92
Wahbe93
Wu94

CITED BY  44

Collaborative Colleagues:
Peter M. Chen: colleagues
Wee Teck Ng: colleagues
Subhachandra Chandra: colleagues
Christopher Aycock: colleagues
Gurushankar Rajamani: colleagues
David Lowell: colleagues