|
ABSTRACT
Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. We classify their failure policies in a new taxonomy that measures their Internal RObustNess (IRON), which includes both failure detection and recovery techniques. We show that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures. Finally, we design, implement, and evaluate a prototype IRON file system, Linux ixt3, showing that techniques such as in-disk checksumming, replication, and parity greatly enhance file system robustness while incurring minimal time and space overheads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Acharya. Reliability on the Cheap: How I Learned to Stop Worrying and Love Cheap PCs. EASY Workshop '02, October 2002.
|
| |
2
|
A. Altaparmakov. The Linux-NTFS Project. http://linux-ntfs.sourceforge.net/ntfs/, August 2005.
|
 |
3
|
Guillermo A. Alvarez , Walter A. Burkhard , Flaviu Cristian, Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering, Proceedings of the 24th annual international symposium on Computer architecture, p.62-72, June 01-04, 1997, Denver, Colorado, United States
|
| |
4
|
D. Anderson. "Drive manufacturers typically don't talk about disk failures". Personal Communication from Dave Anderson of Seagate, 2005.
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
Lakshmi N. Bairavasundaram , Muthian Sivathanu , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs, Proceedings of the 31st annual international symposium on Computer architecture, p.176, June 19-23, 2004, München, Germany
|
| |
9
|
|
| |
10
|
|
| |
11
|
S. Best. JFS Overview. www.ibm.com/developerworks/library/l-jfs.html, 2004.
|
| |
12
|
|
| |
13
|
A. Brown and D. A. Patterson. Towards Maintainability, Availability, and Growth Benchmarks: A Case Study of Software RAID Systems. In Proceedings of the USENIX Annual Technical Conference (USENIX'00), pages 263--276, San Diego, California, June 2000.
|
| |
14
|
G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot -- A Technique for Cheap Recovery. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), pages 31--44, San Francisco, California, December 2004.
|
 |
15
|
Andy Chou , Junfeng Yang , Benjamin Chelf , Seth Hallem , Dawson Engler, An empirical study of operating systems errors, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
16
|
Peter Corbett , Bob English , Atul Goel , Tomislav Grcanac , Steven Kleiman , James Leong , Sunitha Sankar, Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
| |
17
|
|
 |
18
|
|
| |
19
|
J. Dykes. "A modern disk has roughly 400,000 lines of code". Personal Communication from James Dykes of Seagate, August 2005.
|
| |
20
|
EMC. EMC Centera: Content Addressed Storage System. http://www.emc.com/, 2004.
|
| |
21
|
R. W. Emerson. Essays and English Traits -- IV: Self-Reliance. The Harvard classics, edited by Charles W. Eliot. New York: P.F. Collier and Son, 1909-14, Volume 5, 1841. A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines.
|
 |
22
|
Dawson Engler , David Yu Chen , Seth Hallem , Andy Chou , Benjamin Chelf, Bugs as deviant behavior: a general approach to inferring errors in systems code, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
 |
23
|
|
 |
24
|
Garth A. Gibson , David F. Nagle , Khalil Amiri , Fay W. Chang , Eugene M. Feinberg , Howard Gobioff , Chen Lee , Berend Ozceri , Erik Riedel , David Rochberg , Jim Zelenka, File server scaling with network-attached secure disks, Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.272-284, June 15-18, 1997, Seattle, Washington, United States
|
| |
25
|
J. Gray. A Census of Tandem System Availability Between 1985 and 1990. Technical Report 90.1, Tandem Computers, 1990.
|
| |
26
|
R. Green. EIDE Controller Flaws Version 24. http://mindprod.com/eideflaw.html, February 2005.
|
| |
27
|
W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang. Characterization of Linux Kernel Behavior Under Error. In Proceedings of the International Conference on Dependable Systems and Networks (DSN-2003), pages 459--468, San Francisco, California, June 2003.
|
 |
28
|
|
| |
29
|
V. Henson. A Brief History of UNIX File Systems. http://infohost.nmt.edu/~val/fs_slides.pdf, 2004.
|
| |
30
|
D. Hitz, J. Lau, and M. Malcolm. File System Design for an NFS File Server Appliance. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter '94), San Francisco, California, January 1994.
|
 |
31
|
|
| |
32
|
Intel Corp. and IBM Corp. Device Driver Hardening. http://hardeneddrivers.sourceforge.net/, 2002.
|
| |
33
|
H. H. Kari. Latent Sector Faults and Reliability of Disk Arrays. PhD thesis, Helsinki University of Technology, September 1997.
|
| |
34
|
|
| |
35
|
J. Katcher. PostMark: A New File System Benchmark. Technical Report TR-3022, Network Appliance Inc., October 1997.
|
| |
36
|
S. R. Kleiman. Vnodes: An Architecture for Multiple File System Types in Sun UNIX. In Proceedings of the USENIX Summer Technical Conference (USENIX Summer '86), pages 238--247, Atlanta, Georgia, June 1986.
|
| |
37
|
B. Lewis. Smart Filers and Dumb Disks. NSIC OSD Working Group Meeting, April 1999.
|
 |
38
|
Michael G. Luby , Michael Mitzenmacher , M. Amin Shokrollahi , Daniel A. Spielman , Volker Stemann, Practical loss-resilient codes, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.150-159, May 04-06, 1997, El Paso, Texas, United States
[doi> 10.1145/258533.258573]
|
| |
39
|
|
 |
40
|
|
| |
41
|
M. K. McKusick, W. N. Joy, S. J. Leffler, and R. S. Fabry. Fsck - The UNIX File System Check Program. Unix System Manager's Manual - 4.3 BSD Virtual VAX-11 Version, April 1986.
|
| |
42
|
A. Park and K. Balasubramanian. Providing fault tolerance in parallel secondary storage systems. Technical Report CS-TR-057-86, Department of Computer Science, Princeton University, November 1986.
|
| |
43
|
|
| |
44
|
David Patterson , Aaron Brown , Pete Broadwell , George Candea , Mike Chen , James Cutler , Patricia Enriquez , Armando Fox , Emre Kiciman , Matthew Merzbacher , David Oppenheimer , Naveen Sastry , William Tetzlaff , Jonathan Traupman , Noah Treuhaft, Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,, University of California at Berkeley, Berkeley, CA, 2002
|
 |
45
|
David A. Patterson , Garth Gibson , Randy H. Katz, A case for redundant arrays of inexpensive disks (RAID), Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.109-116, June 01-03, 1988, Chicago, Illinois, United States
|
| |
46
|
J. Postel. RFC 793: Transmission Control Protocol, September 1981. Available from ftp://ftp.rfc-editor.org/in-notes/rfc793.txt as of August, 2003.
|
| |
47
|
|
 |
48
|
David D. Redell , Yogen K. Dalal , Thomas R. Horsley , Hugh C. Lauer , William C. Lynch , Paul R. McJones , Hal G. Murray , Stephen C. Purcell, Pilot: an operating system for a personal computer, Communications of the ACM, v.23 n.2, p.81-92, Feb. 1980
[doi> 10.1145/358818.358822]
|
| |
49
|
H. Reiser. ReiserFS. www.namesys.com, 2004.
|
| |
50
|
P. M. Ridge and G. Field. The Book of SCSI 2/E. No Starch, June 2000.
|
| |
51
|
M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu, and J. William S. Beebe. Enhancing Server Availability and Security Through Failure-Oblivious Computing. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, California, December 2004.
|
 |
52
|
|
 |
53
|
|
| |
54
|
J. Schindler. "We have experienced a severe performance degradation that was identified as a problem with disk firmware. The disk drives had to be reprogrammed to fix the problem". Personal Communication from J. Schindler of EMC, July 2005.
|
| |
55
|
|
 |
56
|
|
| |
57
|
Thomas J. E. Schwarz , Qin Xin , Ethan L. Miller , Darrell D. E. Long , Andy Hospodor , Spencer Ng, Disk Scrubbing in Large Archival Storage Systems, Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04), p.409-418, October 04-08, 2004
|
| |
58
|
M. Seltzer, K. Bostic, M. K. McKusick, and C. Staelin. An Implementation of a Log-Structured File System for UNIX. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter '93), pages 307--326, San Diego, California, January 1993.
|
| |
59
|
D. Siewiorek, J. Hudak, B. Suh, and Z. Segal. Development of a Benchmark to Measure System Robustness. In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23), Toulouse, France, June 1993.
|
| |
60
|
M. Sivathanu, L. Bairavasundaram, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Life or Death at Block Level. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), pages 379--394, San Francisco, California, December 2004.
|
| |
61
|
|
| |
62
|
Muthian Sivathanu , Vijayan Prabhakaran , Florentina I. Popovici , Timothy E. Denehy , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, Semantically-Smart Disk Systems, Proceedings of the 2nd USENIX Conference on File and Storage Technologies, March 31-31, 2003, San Francisco, CA
|
| |
63
|
|
| |
64
|
|
| |
65
|
A. Sweeney, D. Doucette, W. Hu, C. Anderson, M. Nishimoto, and G. Peck. Scalability in the XFS File System. In Proceedings of the USENIX Annual Technical Conference (USENIX '96), San Diego, California, January 1996.
|
 |
66
|
|
| |
67
|
N. Talagala and D. Patterson. An Analysis of Error Behaviour in a Large Storage System. In The IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems, San Juan, Puerto Rico, April 1999.
|
| |
68
|
The Data Clinic. Hard Disk Failure. http://www.dataclinic.co.uk/hard-disk-failures.htm, 2004.
|
| |
69
|
Transaction Processing Council. TPC Benchmark B Standard Specification, Revision 3.2. Technical Report, 1990.
|
| |
70
|
|
| |
71
|
S. C. Tweedie. Journaling the Linux ext2fs File System. In The Fourth Annual Linux Expo, Durham, North Carolina, May 1998.
|
| |
72
|
J. Wehman and P. den Haan. The Enhanced IDE/Fast-ATA FAQ. http://thef-nym.sci.kun.nl/cgi-pieterh/atazip/atafq.html, 1998.
|
| |
73
|
G. Weinberg. The Solaris Dynamic File System. http://members.visi.net/~thedave/sun/DynFS.pdf, 2004.
|
 |
74
|
|
| |
75
|
J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using Model Checking to Find Serious File System Errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, California, December 2004.
|
| |
76
|
X. Yu, B. Gum, Y. Chen, R. Y. Wang, K. Li, A. Krishnamurthy, and T. E. Anderson. Trading Capacity for Performance in a Disk Array. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (OSDI '00), San Diego, California, October 2000.
|
CITED BY 28
|
|
|
|
|
Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , Lakshmi N. Bairavasundaram , Timothy E. Denehy , Florentina I. Popovici , Vijayan Prabhakaran , Muthian Sivathanu, Semantically-smart disk systems: past, present, and future, ACM SIGMETRICS Performance Evaluation Review, v.33 n.4, March 2006
|
|
|
|
|
|
Haryadi S. Gunawi , Cindy Rubio-González , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dussea , Ben Liblit, EIO: error handling is occasionally correct, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
Lakshmi N. Bairavasundaram , Garth R. Goodson , Bianca Schroeder , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dussea, An analysis of data corruption in the storage stack, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
Andrew Krioukov , Lakshmi N. Bairavasundaram , Garth R. Goodson , Kiran Srinivasan , Randy Thelen , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dussea, Parity lost and parity regained, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-15, February 26-29, 2008, San Jose, California
|
|
|
|
|
|
Swetha Krishnan , Giridhar Ravipati , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , Barton P. Miller, The effects of metadata corruption on nfs, Proceedings of the 2007 ACM workshop on Storage security and survivability, October 29-29, 2007, Alexandria, Virginia, USA
|
|
|
Mary Baker , Mehul Shah , David S. H. Rosenthal , Mema Roussopoulos , Petros Maniatis , TJ Giuli , Prashanth Bungale, A fresh look at the reliability of long-term digital storage, ACM SIGOPS Operating Systems Review, v.40 n.4, October 2006
|
|
|
Bianca Schroeder , Garth A. Gibson, Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?, Proceedings of the 5th conference on USENIX Conference on File and Storage Technologies, p.1-1, February 13-16, 2007, San Jose, CA
|
|
|
Lakshmi N. Bairavasundaram , Meenali Rungta , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, Limiting trust in the storage stack, Proceedings of the second ACM workshop on Storage security and survivability, October 30-30, 2006, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
Edmund B. Nightingale , Kaushik Veeraraghavan , Peter M. Chen , Jason Flinn, Rethink the sync, Proceedings of the 7th conference on USENIX Symposium on Operating Systems Design and Implementation, p.1-1, November 06-08, 2006, Seattle, WA
|
|
|
|
|
|
|
|
|
|
|
|
Kiron Vijayasankar , Gopalan Sivathanu , Swaminathan Sundararaman , Erez Zadok, Exploiting type-awareness in a self-recovering disk, Proceedings of the 2007 ACM workshop on Storage security and survivability, October 29-29, 2007, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Allen Clement , Mirco Marchetti , Edmund Wong , Lorenzo Alvisi , Mike Dahlin, BFT: the time is now, Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware, September 15-17, 2008, Yorktown Heights, New York
|
INDEX TERMS
Primary Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.3
File Systems Management
Additional Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.5
Reliability
General Terms:
Design,
Experimentation,
Reliability
Keywords:
IRON file systems,
block corruption,
disks,
fail-partial failure model,
fault tolerance,
internal,
latent sector errors,
redundancy,
reliability,
storage
|