|
ABSTRACT
Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85% of recently reported failures. This paper describes Nooks, a reliability subsystem that seeks to greatly enhance OS reliability by isolating the OS from driver failures. The Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture, our goal is to prevent the vast majority of driver-caused crashes with little or no change to existing driver and system code. To achieve this, Nooks isolates drivers within lightweight protection domains inside the kernel address space, where hardware and software prevent them from corrupting the kernel. Nooks also tracks a driver's use of kernel resources to hasten automatic clean-up during recovery.To prove the viability of our approach, we implemented Nooks in the Linux operating system and used it to fault-isolate several device drivers. Our results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quickly recovering from many faults that would otherwise crash the system. In a series of 2000 fault-injection tests, Nooks recovered automatically from 99% of the faults that caused Linux to crash.While Nooks was designed for drivers, our techniques generalize to other kernel extensions, as well. We demonstrate this by isolating a kernel-mode file system and an in-kernel Internet service. Overall, because Nooks supports existing C-language extensions, runs on a commodity operating system and hardware, and enables automated recovery, it represents a substantial step beyond the specialized architectures and type-safe languages required by previous efforts directed at safe extensibility.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Apache Project. http://httpd.apache.orgApache HTTP server version 2.0, 2000. Available at http://httpd.apache.org.
|
 |
2
|
|
 |
3
|
B. N. Bershad , S. Savage , P. Pardyak , E. G. Sirer , M. E. Fiuczynski , D. Becker , C. Chambers , S. Eggers, Extensibility safety and performance in the SPIN operating system, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.267-283, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
J. Chapin , M. Rosenblum , S. Devine , T. Lahiri , D. Teodosiu , A. Gupta, Hive: fault containment for shared-memory multiprocessors, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.12-25, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
8
|
|
| |
9
|
|
 |
10
|
Andy Chou , Junfeng Yang , Benjamin Chelf , Seth Hallem , Dawson Engler, An empirical study of operating systems errors, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
11
|
|
 |
12
|
|
 |
13
|
Richard P. Draves , Brian N. Bershad , Richard F. Rashid , Randall W. Dean, Using continuations to implement thread management and communication in operating systems, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.122-136, October 13-16, 1991, Pacific Grove, California, United States
|
| |
14
|
D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of the 4th USENIX Symposium on Operating Systems Design and Implementation, pages 1--16, 2000.
|
 |
15
|
D. R. Engler , M. F. Kaashoek , J. O'Toole, Jr., Exokernel: an operating system architecture for application-level resource management, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.251-266, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
16
|
|
 |
17
|
Bryan Ford , Godmar Back , Greg Benson , Jay Lepreau , Albert Lin , Olin Shivers, The Flux OSKit: a substrate for kernel and language research, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.38-51, October 05-08, 1997, Saint Malo, France
|
| |
18
|
A. Forin, D. Golub, and B. Bershad. An I/O system for Mach. In Proc. Usenix Mach Symposium, pages 163--176, Nov. 1991.
|
| |
19
|
J. Gettys, P. L. Carlton, and S. McGregor. http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-90-8.pdfThe X window system version 11. Technical Report CRL-90-08, Digital Equipment Corporation, Dec. 1900.
|
| |
20
|
A. Gillen, D. Kusnetzky, and S. McLaron. The role of Linux in reducing the cost of enterprise computing, Jan. 2002. IDC white paper.
|
| |
21
|
|
| |
22
|
J. C. Haarsten. The Bluetooth radio system. IEEE Personal Communications Magazine, 7(1):28--36, Feb. 2000.
|
| |
23
|
|
| |
24
|
Hewlett Packard. http://www.hp.com/hpinfo/newsroom/press/31oct01a.htmHewlett Packard Digital Entertainment Center, Oct. 2001. http://www.hp.com/hpinfo/newsroom/press/31oct01a.htm.
|
| |
25
|
Merle E. Houdek , Frank G. Soltis , Roy L. Hoffman, IBM System/38 support for capability-based addressing, Proceedings of the 8th annual symposium on Computer Architecture, p.341-348, May 12-14, 1981, Minneapolis, Minnesota, United States
|
| |
26
|
|
| |
27
|
Intel Corporation. The IA-32 Architecture Software Developer's Manual, Volume 1: Basic Architecture. Intel Corporation, Jan. 2002. Available at http://www.intel.com/design/pentium4/manuals/24547010.pdf.
|
| |
28
|
R. Jones. http://www.netperf.orgNetperf: A network performance benchmark, version 2.1, 1995. Available at http://www.netperf.org.
|
 |
29
|
Eric J. Koldinger , Jeffrey S. Chase , Susan J. Eggers, Architecture support for single address space operating systems, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.175-186, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
30
|
|
 |
31
|
|
| |
32
|
Microsoft Corporation. http://www.microsoft.com/hwdev/download/hardware/fatgen103.pdf FAT: General overview of on-disk format, version 1.03, Dec. 2000.
|
| |
33
|
D. Mosberger and T. Jin. httperf: A tool for measuring web server performance. In First Workshop on Internet Server Performance, pages 59---67, June 1998. ACM.
|
 |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
David Patterson , Aaron Brown , Pete Broadwell , George Candea , Mike Chen , James Cutler , Patricia Enriquez , Armando Fox , Emre Kiciman , Matthew Merzbacher , David Oppenheimer , Naveen Sastry , William Tetzlaff , Jonathan Traupman , Noah Treuhaft, Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,, University of California at Berkeley, Berkeley, CA, 2002
|
| |
38
|
Project-UDI. Introduction to UDI version 1.0. Technical report, Project UDI, Aug. 1999.
|
| |
39
|
Rob Short, Vice President of Windows Core Technology, Microsoft Corp. private communication, 2003.
|
 |
40
|
|
 |
41
|
|
 |
42
|
|
 |
43
|
Margo I. Seltzer , Yasuhiro Endo , Christopher Small , Keith A. Smith, Dealing with disaster: surviving misbehaved kernel extensions, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.213-227, October 29-November 01, 1996, Seattle, Washington, United States
|
| |
44
|
Standard Performance Evaluation Corporation. http://www.spec.org/osg/web99/The SPECweb99 benchmark, 1999.
|
| |
45
|
|
| |
46
|
|
| |
47
|
P. Thurrott. Windows 2000 server: The road to gold, part two: Developing windows. Paul Thurrott's SuperSite for Windows, Jan. 2003.
|
| |
48
|
TiVo Corporation. www.tivo.com TiVo digital video recorder, 2001. www.tivo.com.
|
| |
49
|
V. Uhlig, U. Dannowski, E. Skoglund, A. Haeberlen, and G. Heiser. Performance of address-space multiplexing on the Pentium. Technical Report 2002-1, University of Karlsruhe, 2002.
|
| |
50
|
A. van de Ven. http://www.fenrus.demon.nl/kHTTPd: Linux HTTP accelerator. Available at http://www.fenrus.demon.nl/.
|
 |
51
|
Robert Wahbe , Steven Lucco , Thomas E. Anderson , Susan L. Graham, Efficient software-based fault isolation, Proceedings of the fourteenth ACM symposium on Operating systems principles, p.203-216, December 05-08, 1993, Asheville, North Carolina, United States
|
| |
52
|
D. A. Wheeler. http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.htmlMore than a gigabuck: Estimating GNU/Linux's size, July 2002. Available at http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.html.
|
| |
53
|
A. Whitaker, M. Shaw, and S. D. Gribble. Denali: Lightweight virtual machines for distributed and networked applications. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation, pages 195--209, Dec. 2002.
|
 |
54
|
|
| |
55
|
M. Young, M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, and A. Tevanian. Mach: A new kernel foundation for UNIX development. In Proceedings of the 1986 Summer USENIX Conference, pages 93--113, June 1986.
|
CITED BY 38
|
|
Tal Garfinkel , Ben Pfaff , Jim Chow , Mendel Rosenblum , Dan Boneh, Terra: a virtual machine-based platform for trusted computing, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
|
|
Florin Sultan , Aniruddha Bohra , Stephen Smaldone , Yufei Pan , Pascal Gallard , Iulian Neamtiu , Liviu Iftode, Recovering Internet Service Sessions from Operating System Failures, IEEE Internet Computing, v.9 n.2, p.17-27, March 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haryadi S. Gunawi , Cindy Rubio-González , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dussea , Ben Liblit, EIO: error handling is occasionally correct, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
Daniel Peek , Edmund B. Nightingale , Brett D. Higgins , Puspesh Kumar , Jason Flinn, Sprockets: safe extensions for distributed file systems, 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, p.1-14, June 17-22, 2007, Santa Clara, CA
|
|
|
|
|
|
Mark Aiken , Manuel Fähndrich , Chris Hawblitzel , Galen Hunt , James Larus, Deconstructing process isolation, Proceedings of the 2006 workshop on Memory system performance and correctness, October 22-22, 2006, San Jose, California
|
|
|
Lakshmi N. Bairavasundaram , Meenali Rungta , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, Limiting trust in the storage stack, Proceedings of the second ACM workshop on Storage security and survivability, October 30-30, 2006, Alexandria, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Thomas Ball , Ella Bounimova , Byron Cook , Vladimir Levin , Jakob Lichtenberg , Con McGarvey , Bohus Ondrusek , Sriram K. Rajamani , Abdullah Ustuner, Thorough static analysis of device drivers, ACM SIGOPS Operating Systems Review, v.40 n.4, October 2006
|
|
|
|
|
|
|
|
|
Joshua LeVasseur , Volkmar Uhlig , Jan Stoess , Stefan Götz, Unmodified device driver reuse and improved system dependability via virtual machines, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.2-2, December 06-08, 2004, San Francisco, CA
|
|
|
George Candea , Shinichi Kawamoto , Yuichi Fujiki , Greg Friedman , Armando Fox, Microreboot — A technique for cheap recovery, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.3-3, December 06-08, 2004, San Francisco, CA
|
|
|
Martin Rinard , Cristian Cadar , Daniel Dumitran , Daniel M. Roy , Tudor Leu , William S. Beebee, Jr., Enhancing server availability and security through failure-oblivious computing, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.21-21, December 06-08, 2004, San Francisco, CA
|
|
|
|
|
|
|
|
|
Feng Zhou , Jeremy Condit , Zachary Anderson , Ilya Bagrak , Rob Ennals , Matthew Harren , George Necula , Eric Brewer, SafeDrive: safe and recoverable extensions using language-based techniques, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
|
|
|
|
|
Galen Hunt , Mark Aiken , Manuel Fähndrich , Chris Hawblitzel , Orion Hodson , James Larus , Steven Levi , Bjarne Steensgaard , David Tarditi , Ted Wobber, Sealing OS processes to improve dependability and safety, ACM SIGOPS Operating Systems Review, v.41 n.3, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Úlfar Erlingsson , Martín Abadi , Michael Vrable , Mihai Budiu , George C. Necula, XFI: software guards for system address spaces, Proceedings of the 7th symposium on Operating systems design and implementation, November 06-08, 2006, Seattle, Washington
|
|
|
|
|
|
Prashant Dewan , David Durham , Hormuzd Khosravi , Men Long , Gayathri Nagabhushan, A hypervisor-based system for protecting software runtime memory and persistent storage, Proceedings of the 2008 Spring simulation multiconference, April 14-17, 2008, Ottawa, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|