ABSTRACT
We introduce a new reliability infrastructure for file systems called I/O shepherding. I/O shepherding allows a file system developer to craft nuanced reliability policies to detect and recover from a wide range of storage system failures. We incorporate shepherding into the Linux ext3 file system through a set of changes to the consistency management subsystem, layout engine, disk scheduler, and buffer cache. The resulting file system, CrookFS, enables a broad class of policies to be easily and correctly specified. We implement numerous policies, incorporating data protection techniques such as retry, parity, mirrors, checksums, sanity checks, and data structure repairs; even complex policies can be implemented in less than 100 lines of code, confirming the power and simplicity of the shepherding framework. We also demonstrate that shepherding is properly integrated, adding less than 5% overhead to the I/O path.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
David Andersen , Deepak Bansal , Dorothy Curtis , Srinivasan Seshan , Hari Balakrishnan, System support for bandwidth management and content adaptation in internet applications, Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, p.15-15, October 22-25, 2000, San Diego, California
|
| |
2
|
|
| |
3
|
Lakshmi Bairavasundaram. On the frequency of transient faults in modern disk drives. Personal Communication, 2007.
|
 |
4
|
Lakshmi N. Bairavasundaram , Garth R. Goodson , Shankar Pasupathy , Jiri Schindler, An analysis of latent sector errors in disk drives, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
 |
5
|
Hari Balakrishnan , Hariharan S. Rahul , Srinivasan Seshan, An integrated congestion management architecture for Internet hosts, Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, p.175-187, August 30-September 03, 1999, Cambridge, Massachusetts, United States
|
| |
6
|
|
| |
7
|
|
 |
8
|
Andy Chou , Junfeng Yang , Benjamin Chelf , Seth Hallem , Dawson Engler, An empirical study of operating systems errors, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
 |
9
|
|
| |
10
|
Peter Corbett , Bob English , Atul Goel , Tomislav Grcanac , Steven Kleiman , James Leong , Sunitha Sankar, Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
| |
11
|
|
 |
12
|
Dawson Engler , David Yu Chen , Seth Hallem , Andy Chou , Benjamin Chelf, Bugs as deviant behavior: a general approach to inferring errors in systems code, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
 |
13
|
|
| |
14
|
Jim Gray. A Census of Tandem System Availability Between 1985 and 1990. Technical Report 90.1, Tandem Computers, 1990.
|
| |
15
|
|
| |
16
|
Roedy Green. EIDE Controller Flaws Version 24. http://mindprod.com/jgloss/eideflaw.html, February 2005.
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
Jeffrey Katcher. PostMark: A New File System Benchmark. Technical Report TR-3022, Network Appliance Inc., October 1997.
|
| |
23
|
Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Lopes, Jean-Marc Loingtier, and John Irwin. Aspect-Oriented Programming. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP), pages 220--242, 1997.
|
| |
24
|
Steve R. Kleiman. Vnodes: An Architecture for Multiple File System Types in Sun UNIX. In USENIX Summer'86, pages 238--247, Atlanta, GA, June 1986.
|
| |
25
|
|
| |
26
|
|
 |
27
|
Robert Morris , Eddie Kohler , John Jannotti , M. Frans Kaashoek, The Click modular router, Proceedings of the seventeenth ACM symposium on Operating systems principles, p.217-231, December 12-15, 1999, Charleston, South Carolina, United States
|
| |
28
|
Kiran Nagaraja , Fábio Oliveira , Ricardo Bianchini , Richard P. Martin , Thu D. Nguyen, Understanding and dealing with operator mistakes in internet services, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.5-5, December 06-08, 2004, San Francisco, CA
|
 |
29
|
David A. Patterson , Garth Gibson , Randy H. Katz, A case for redundant arrays of inexpensive disks (RAID), Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.109-116, June 01-03, 1988, Chicago, Illinois, United States
|
 |
30
|
Vijayan Prabhakaran , Lakshmi N. Bairavasundaram , Nitin Agrawal , Haryadi S. Gunawi , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, IRON file systems, Proceedings of the twentieth ACM symposium on Operating systems principles, October 23-26, 2005, Brighton, United Kingdom
|
| |
31
|
Bianca Schroeder , Garth A. Gibson, Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?, Proceedings of the 5th USENIX conference on File and Storage Technologies, p.1-es, February 13-16, 2007, San Jose, CA
|
| |
32
|
|
| |
33
|
Sun Microsystems. ZFS: The last word in file systems. www.sun.com/2004-0914/feature/, 2006.
|
| |
34
|
Rajesh Sundaram. The Private Lives of Disk Drives. http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html, February 2006.
|
 |
35
|
|
| |
36
|
Nisha Talagala and David Patterson. An Analysis of Error Behaviour in a Large Storage System. In The IEEE Workshop on Fault Tolerance in Parallel and Distributed Systems, San Juan, Puerto Rico, April 1999.
|
| |
37
|
Transaction Processing Council. TPC Benchmark B Standard Specification, Revision 3.2. Technical Report, 1990.
|
| |
38
|
Stephen C. Tweedie. Journaling the Linux ext2fs File System. In The Fourth Annual Linux Expo, Durham, North Carolina, May 1998.
|
| |
39
|
Xiang Yu , Benjamin Gum , Yuqun Chen , Randolph Y. Wang , Kai Li , Arvind Krishnamurthy , Thomas E. Anderson, Trading capacity for performance in a disk array, Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, p.17-17, October 22-25, 2000, San Diego, California
|
| |
40
|
|
CITED BY 4
|
|
|
|
|
|
|
|
Hakim Weatherspoon , Lakshmi Ganesh , Tudor Marian , Mahesh Balakrishnan , Ken Birman, Smoke and mirrors: reflecting files at a geographically remote location without loss of performance, Proccedings of the 7th conference on File and stroage technologies, p.211-224, February 24-27, 2009, San Francisco, California
|
|
|
|
|