ACM Home Page
Please provide us with feedback. Feedback
Tolerating hardware device failures in software
Full text PdfPdf (1.92 MB)
Source
ACM Symposium on Operating Systems Principles archive
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles table of contents
Big Sky, Montana, USA
SESSION: Device drivers table of contents
Pages 59-72  
Year of Publication: 2009
ISBN:978-1-60558-752-3
Authors
Asim Kadav  University of Wisconsin-Madison, Madison, WI, USA
Matthew J. Renzelmann  University of Wisconsin-Madison, Madison, WI, USA
Michael M. Swift  University of Wisconsin-Madison, Madison, WI, USA
Sponsors
ACM: Association for Computing Machinery
SIGOPS: ACM Special Interest Group on Operating Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 107,   Downloads (12 Months): 107,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629575.1629582
What is a DOI?

ABSTRACT

Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While major operating system and device vendors recommend that drivers detect and recover from hardware failures, we find that there are many drivers that will crash or hang when a device fails. Such bugs cannot easily be detected by regular stress testing because the failures are induced by the device and not the software load. This paper describes Carburizer, a code-manipulation tool and associated runtime that improves system reliability in the presence of faulty devices. Carburizer analyzes driver source code to find locations where the driver incorrectly trusts the hardware to behave. Carburizer identified almost 1000 such bugs in Linux drivers with a false positive rate of less than 8 percent. With the aid of shadow drivers for recovery, Carburizer can automatically repair 840 of these bugs with no programmer involvement. To facilitate proactive management of device failures, Carburizer can also locate existing driver code that detects device failures and inserts missing failure-reporting code. Finally, the Carburizer runtime can detect and tolerate interrupt-related bugs, such as stuck or missing interrupts.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R.H. Arpaci-Dusseau and A.C. Arpaci-Dusseau. Fail-stutter fault tolerance. In Proc. of the Eighth IEEE HOTOS, May 2001.
 
2
S. Arthur. Fault resilient drivers for Longhorn server, May 2004. Microsoft Corporation, WinHec 2004 Presentation DW04012.
 
3
L.N. Bairavasundaram, G.R. Goodson, S. Pasupathy, and J. Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proc. of the 7th SIGMETRICS, June 2007.
 
4
T. Ball, E. Bounimova, B. Cook, V. Levin, J. Lichtenberg, C. McGarvey, B. Ondrusek, S.K. Rajamani, and A. Ustuner. Thorough static analysis of device drivers. In Proc. of the 2006 EuroSys Conference, 2006.
 
5
T. Ball, E. Bounimova, B. Cook, V. Levin, J. Lichtenberg, C. McGarvey, B. Ondrusek, S.K. Rajamani, and A. Ustuner. Thorough static analysis of device drivers. In Proc. of the 2006 EuroSys Conference, Apr. 2006.
 
6
T. Ball and S.K. Rajamani. The SLAM project: Debugging system software via static analysis. In Proc. of the 29th POPL, 2002.
 
7
J.F. Bartlett. A NonStop kernel. In Proc. of the 8th ACM SOSP, Dec. 1981.
 
8
P. Chandrashekaran, C. Conway, J.M. Joy, and S.K. Rajamani. Programming asynchronous layers with CLARITY. In Proc. of the 15th Annual Symposium on Foundations of Software Engineering, Sept. 2007.
 
9
Coverity. Anaylsis of the Linux kernel, 2004. Available at http://www.coverity.com.
 
10
W. Cui, M. Peinado, H.J. Wang, and M.E. Locasto. Shieldgen: Automatic data patch generation for unknown vulnerabilities with informed probing. In Proc. of the IEEE Symposium on Security and Privacy, 2007.
 
11
F.M. David, E.M. Chan, J.C. Carlyle, and R.H. Campbell. CuriOS: Improving reliability through operating system structure. In Proc. of the 8th USENIX OSDI, December 2008.
 
12
D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In Proc. of the 4th USENIX OSDI, Oct. 2000.
 
13
K. Fraser, S. Hand, R. Neugebauer, I. Pratt, A. Warfield, and M. Williamson. Safe hardware access with the Xen virtual machine monitor. In OASIS Workhop, 2004.
 
14
N. Ganapathy, 2009. Architect, Microsoft Windows Driver Experience team, personal communication.
 
15
V. Ganapathy, M.J. Renzelmann, A. Balakrishnan, M.M. Swift, and S. Jha. The design and implementation of microdrivers. In Proc. of the 13th ACM ASPLOS, Mar. 2008.
 
16
S. Graham. Writing drivers for reliability, robustness and fault tolerant systems. http://www.microsoft.com/whdc/archive/FTdrv.mspx, Apr. 2004.
 
17
S.R. Hanson and E.J. Radley. Testing device driver hardening, May 2005. US Patent 6,971,048.
 
18
J.N. Herder, H. Bos, B. Gras, P. Homburg, and A.S. Tanenbaum. Failure resilience for device drivers. In Proc. of the 2007 IEEE DSN, June 2007.
 
19
Hewlett Packard Corp. Parallel processing of TCP/IP with ethernet adapter failover. http://h20223.www2.hp.com/NonStopComputing/downloads/EAFailoverTCP-IP-PL.pdf, 2002.
 
20
Intel Corporation and IBM Corporation. Device driver hardening design specification draft release 0.5h. http://hardeneddrivers.sourceforge.net/downloads/DDH-Spec-0.5h.pdf, Aug. 2002.
 
21
R. Jones. Netperf: A network performance benchmark, version 2.1, 1995. Available at http://www.netperf.org.
 
22
A. Kadav and M.M. Swift. Live migration of direct-access devices. In First Workshop on I/O Virtualization (WIOV '08), Dec. 2008.
 
23
H.A. Lagar-Cavilla, N. Tolia, M. Satyanarayanan, and E. de Lara. VMM-independent graphics acceleration. In Proc. of the 3rd VEE, June 2007.
 
24
B. Leslie, P. Chubb, N. Fitzroy-Dale, S. Gotz, C. Gray, L. Macpherson, D. Potts, Y. Shen, K. Elphinstone, and G. Heiser. User-level device drivers: Achieved performance. Journal Computer Science and Technology, 20(5), Sept. 2005.
 
25
M.-L. Li, P. Ramachandran, S. Sahoo, S. Adve, V. Adve, and Y. Zhou. Understanding the propagation of hard errors to software and implications for resilient system design. In Proc. of the 13th ACM ASPLOS, Mar. 2008.
 
26
Linux Kernel Mailing List. Fixes for uli5261 (tulip driver). http://lkml.org/lkml/2006/8/19/59, Aug. 2006.
 
27
Linux Kernel Mailing List. Improve behaviour of spurious irq detect. http://lkml.org/lkml/2007/6/7/211, June 2007.
 
28
F. Mérillon, L. Réveillère, C. Consel, R. Marlet, and G. Muller. Devil: An IDL for hardware programming. In Proc. of the 4th USENIX OSDI, Oct. 2000.
 
29
Microsoft Corporation. Introduction to the WDF user-mode driver framework. http://www.microsoft.com/whdc/driver/wdf/umdf_intro.mspx, May 2006.
 
30
G.C. Necula, S. Mcpeak, S.P. Rahul, and W. Weimer. CIL: Intermediate language and tools for analysis and transformation of C programs. In Proc. of the 11th International Conference on Compiler Construction, 2002.
 
31
E. Pinheiro, W.-D. Weber, and L.A. Barroso. Failure trends in a large disk drive population. In Proc. of the 5th FAST, 2007.
 
32
H. Post and W. Kuchlin. Integrated static analysis for Linux device driver verification. In Proc. of the 6th International Conference on Integrated Formal Methods, July 2007.
 
33
L. Ryzhyk, P. Chubb, I. Kuz, and G. Heiser. Dingo: Taming device drivers. In Proc. of the 200 EuroSys Conference, Apr. 2009.
 
34
T. Shureih. HOWTO: Linux device driver dos and don'ts. http://janitor.kernelnewbies.org/docs/driver-howto.html, Mar. 2004.
 
35
S. Sidiroglou and A.D. Keromytis. Countering network worms through automatic patch generation. IEEE Security and Privacy, 3(6):41--49, 2005.
 
36
A. Smirnov and Tzi-ckerChiueh. Automatic patch generation for buffer overflow attacks. In Proc. of the 3rd Symposium on Information Assurance and Security, 2007.
 
37
M. Spear, T. Roeder, O. Hodson, G. Hunt, and S. Levi. Solving the starting problem: Device drivers as self-describing artifacts. In Proc. of the 2006 EuroSys Conference, Apr. 2006.
 
38
S.Y.H. Su and R.J. Spillman. An overview of fault-tolerant digital system architecture. In Proc. of the National Computer Conference (AFIPS), 1977.
 
39
J. Sun, W. Yuan, M. Kallahalla, and N. Islam. HAIL: A language for easy and correct device access. In Proc. of the 5th ACM International Conference on Embedded Software, Sept. 2005.
 
40
Sun Microsystems. Opensolaris community: Fault management. http://opensolaris.org/os/community/fm/.
 
41
Sun Microsystems. Solaris Express Software Developer Collection: Writing Device Drivers, chapter 13: Hardening Solaris Drivers. Sun Microsystems, 2007.
 
42
M.Süßkraut and C. Fetzer. Automatically finding and patching bad error handling. In Proc. of the 6th EDCC, Oct. 2006.
 
43
M. Swift, M. Annamalau, B.N. Bershad, and H.M. Levy. Recovering device drivers. ACM Transactions on Computer Systems, 24(4), Nov. 2006.
 
44
M.M. Swift, B.N. Bershad, and H.M. Levy. Improving the reliability of commodity operating systems. ACM Transactions on Computer Systems, 23(1), Feb. 2005.
 
45
L. Tan, E.M. Chan, R. Farivar, N. Mallick, J.C. Carlyle, F.M. David, and R.H. Campbell. iKernel: Isolating buggy and malicious device drivers using hardware virtualization support. In Proc. of the 3rd DASC, 2007.
 
46
Ûlfar Erlingsson, M. Abadi, M. Vrable, M. Budiu, and G.C. Necula. Xfi: software guards for system address spaces. In Proc. of the 7th USENIX OSDI, 2006.
 
47
D. Walker, L. Mackey, J. Ligatti, G.A. Reis, and D.I. August. Static typing for a faulty lambda calculus. In Proc. of the ICFP Conference, Sept. 2006.
 
48
D. Williams, P. Reynolds, K. Walsh, E.G. Sirer, and F.B. Schneider. Device driver safety through a reference validation mechanism. In Proc. of the 8th USENIX OSDI, 2008.
 
49
L. Wittie, C. Hawblitzel, and D. Pierret. Generating a statically-checkable device driver I/O interface. In Workshop on Automatic Program Generation for Embedded Systems, Oct. 2007.
 
50
J. Yang. Zero-penalty RAID controller memory leak detection and isolation method and system utilizing sequence numbers, 2007. Patent application 11715680.
 
51
F. Zhou, J. Condit, Z. Anderson, I. Bagrak, R. Ennals, M. Harren, G. Necula, and E. Brewer. SafeDrive: Safe and recoverable extensions using language-based techniques. In Proc. of the 7th USENIX OSDI, Nov. 2006.
 
52
L. Zhuang, S. Wang, and K. Gao. Fault injection test harness. In Proc. of the Ottawa Linux Symposium, June 2003.