ACM Home Page
Please provide us with feedback. Feedback
IRON file systems
Full text PdfPdf (324 KB)
Source ACM SIGOPS Operating Systems Review archive
Volume 39 ,  Issue 5  (December 2005) table of contents
SOSP '05
SESSION: Filesystems table of contents
Pages: 206 - 220  
Year of Publication: 2005
ISSN:0163-5980
Also published in ...
Authors
Vijayan Prabhakaran  University of Wisconsin, Madison
Lakshmi N. Bairavasundaram  University of Wisconsin, Madison
Nitin Agrawal  University of Wisconsin, Madison
Haryadi S. Gunawi  University of Wisconsin, Madison
Andrea C. Arpaci-Dusseau  University of Wisconsin, Madison
Remzi H. Arpaci-Dusseau  University of Wisconsin, Madison
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 178,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1095809.1095830
What is a DOI?

ABSTRACT

Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. We classify their failure policies in a new taxonomy that measures their Internal RObustNess (IRON), which includes both failure detection and recovery techniques. We show that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures. Finally, we design, implement, and evaluate a prototype IRON file system, Linux ixt3, showing that techniques such as in-disk checksumming, replication, and parity greatly enhance file system robustness while incurring minimal time and space overheads.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Acharya. Reliability on the Cheap: How I Learned to Stop Worrying and Love Cheap PCs. EASY Workshop '02, October 2002.
 
2
A. Altaparmakov. The Linux-NTFS Project. http://linux-ntfs.sourceforge.net/ntfs/, August 2005.
3
 
4
D. Anderson. "Drive manufacturers typically don't talk about disk failures". Personal Communication from Dave Anderson of Seagate, 2005.
 
5
6
 
7
8
 
9
 
10
 
11
S. Best. JFS Overview. www.ibm.com/developerworks/library/l-jfs.html, 2004.
 
12
 
13
A. Brown and D. A. Patterson. Towards Maintainability, Availability, and Growth Benchmarks: A Case Study of Software RAID Systems. In Proceedings of the USENIX Annual Technical Conference (USENIX'00), pages 263--276, San Diego, California, June 2000.
 
14
G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot -- A Technique for Cheap Recovery. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), pages 31--44, San Francisco, California, December 2004.
15
 
16
 
17
18
 
19
J. Dykes. "A modern disk has roughly 400,000 lines of code". Personal Communication from James Dykes of Seagate, August 2005.
 
20
EMC. EMC Centera: Content Addressed Storage System. http://www.emc.com/, 2004.
 
21
R. W. Emerson. Essays and English Traits -- IV: Self-Reliance. The Harvard classics, edited by Charles W. Eliot. New York: P.F. Collier and Son, 1909-14, Volume 5, 1841. A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.
22
23
24
 
25
J. Gray. A Census of Tandem System Availability Between 1985 and 1990. Technical Report 90.1, Tandem Computers, 1990.
 
26
R. Green. EIDE Controller Flaws Version 24. http://mindprod.com/eideflaw.html, February 2005.
 
27
W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang. Characterization of Linux Kernel Behavior Under Error. In Proceedings of the International Conference on Dependable Systems and Networks (DSN-2003), pages 459--468, San Francisco, California, June 2003.
28
 
29
V. Henson. A Brief History of UNIX File Systems. http://infohost.nmt.edu/~val/fs_slides.pdf, 2004.
 
30
D. Hitz, J. Lau, and M. Malcolm. File System Design for an NFS File Server Appliance. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter '94), San Francisco, California, January 1994.
31
 
32
Intel Corp. and IBM Corp. Device Driver Hardening. http://hardeneddrivers.sourceforge.net/, 2002.
 
33
H. H. Kari. Latent Sector Faults and Reliability of Disk Arrays. PhD thesis, Helsinki University of Technology, September 1997.
 
34
 
35
J. Katcher. PostMark: A New File System Benchmark. Technical Report TR-3022, Network Appliance Inc., October 1997.
 
36
S. R. Kleiman. Vnodes: An Architecture for Multiple File System Types in Sun UNIX. In Proceedings of the USENIX Summer Technical Conference (USENIX Summer '86), pages 238--247, Atlanta, Georgia, June 1986.
 
37
B. Lewis. Smart Filers and Dumb Disks. NSIC OSD Working Group Meeting, April 1999.
38
 
39
40
 
41
M. K. McKusick, W. N. Joy, S. J. Leffler, and R. S. Fabry. Fsck - The UNIX File System Check Program. Unix System Manager's Manual - 4.3 BSD Virtual VAX-11 Version, April 1986.
 
42
A. Park and K. Balasubramanian. Providing fault tolerance in parallel secondary storage systems. Technical Report CS-TR-057-86, Department of Computer Science, Princeton University, November 1986.
 
43
 
44
45
 
46
J. Postel. RFC 793: Transmission Control Protocol, September 1981. Available from ftp://ftp.rfc-editor.org/in-notes/rfc793.txt as of August, 2003.
 
47
48
 
49
H. Reiser. ReiserFS. www.namesys.com, 2004.
 
50
P. M. Ridge and G. Field. The Book of SCSI 2/E. No Starch, June 2000.
 
51
M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu, and J. William S. Beebe. Enhancing Server Availability and Security Through Failure-Oblivious Computing. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, California, December 2004.
52
53
 
54
J. Schindler. "We have experienced a severe performance degradation that was identified as a problem with disk firmware. The disk drives had to be reprogrammed to fix the problem". Personal Communication from J. Schindler of EMC, July 2005.
 
55
56
 
57
 
58
M. Seltzer, K. Bostic, M. K. McKusick, and C. Staelin. An Implementation of a Log-Structured File System for UNIX. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter '93), pages 307--326, San Diego, California, January 1993.
 
59
D. Siewiorek, J. Hudak, B. Suh, and Z. Segal. Development of a Benchmark to Measure System Robustness. In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23), Toulouse, France, June 1993.
 
60
M. Sivathanu, L. Bairavasundaram, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Life or Death at Block Level. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), pages 379--394, San Francisco, California, December 2004.
 
61
 
62
 
63
 
64
 
65
A. Sweeney, D. Doucette, W. Hu, C. Anderson, M. Nishimoto, and G. Peck. Scalability in the XFS File System. In Proceedings of the USENIX Annual Technical Conference (USENIX '96), San Diego, California, January 1996.
66
 
67
N. Talagala and D. Patterson. An Analysis of Error Behaviour in a Large Storage System. In The IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems, San Juan, Puerto Rico, April 1999.
 
68
The Data Clinic. Hard Disk Failure. http://www.dataclinic.co.uk/hard-disk-failures.htm, 2004.
 
69
Transaction Processing Council. TPC Benchmark B Standard Specification, Revision 3.2. Technical Report, 1990.
 
70
 
71
S. C. Tweedie. Journaling the Linux ext2fs File System. In The Fourth Annual Linux Expo, Durham, North Carolina, May 1998.
 
72
J. Wehman and P. den Haan. The Enhanced IDE/Fast-ATA FAQ. http://thef-nym.sci.kun.nl/cgi-pieterh/atazip/atafq.html, 1998.
 
73
G. Weinberg. The Solaris Dynamic File System. http://members.visi.net/~thedave/sun/DynFS.pdf, 2004.
74
 
75
J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using Model Checking to Find Serious File System Errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, California, December 2004.
 
76
X. Yu, B. Gum, Y. Chen, R. Y. Wang, K. Li, A. Krishnamurthy, and T. E. Anderson. Trading Capacity for Performance in a Disk Array. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (OSDI '00), San Diego, California, October 2000.

CITED BY  28

Collaborative Colleagues:
Vijayan Prabhakaran: colleagues
Lakshmi N. Bairavasundaram: colleagues
Nitin Agrawal: colleagues
Haryadi S. Gunawi: colleagues
Andrea C. Arpaci-Dusseau: colleagues
Remzi H. Arpaci-Dusseau: colleagues