ACM Home Page
Please provide us with feedback. Feedback
Recovery domains: an organizing principle for recoverable operating systems
Full text PdfPdf (446 KB)
Source
Architectural Support for Programming Languages and Operating Systems archive
Proceeding of the 14th international conference on Architectural support for programming languages and operating systems table of contents
Washington, DC, USA
SESSION: Reliable systems I table of contents
Pages 49-60  
Year of Publication: 2009
ISBN:978-1-60558-406-5
Also published in ...
Authors
Andrew Lenharth  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Vikram S. Adve  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Samuel T. King  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
SIGOPS: ACM Special Interest Group on Operating Systems
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 31,   Downloads (12 Months): 191,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508244.1508251
What is a DOI?

ABSTRACT

We describe a strategy for enabling existing commodity operating systems to recover from unexpected run-time errors in nearly any part of the kernel, including core kernel components. Our approach is dynamic and request-oriented; it isolates the effects of a fault to the requests that caused the fault rather than to static kernel components. This approach is based on a notion of "recovery domains," an organizing principle to enable rollback of state affected by a request in a multithreaded system with minimal impact on other requests or threads. We have applied this approach on v2.4.22 and v2.6.27 of the Linux kernel and it required 132 lines of changed or new code: the other changes are all performed by a simple instrumentation pass of a compiler. Our experiments show that the approach is able to recover from otherwise fatal faults with minimal collateral impact during a recovery event.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
6
7
 
8
9
10
 
11
G. C. Hunt, J. R. Larus, M. Abadi, M. Aiken, P. Barham, M. Fýhndrich, C. H. O. Hodson, S. Levi, N. Murphy, B. Steensgaard, D. Tarditi, T. Wobber, and B. Zill. An overview of the Singularity project. Technical Report MSR-TR-2005-135, Microsoft Research, October 2005.
 
12
 
13
14
15
16
17
18
19
 
20
A. Shinnar, D. Tarditi, M. Plesko, and B. Steensgaard. Integrating support for undo with exception handling. Technical Report MSR-TR-2004-140, Microsoft Research, Dec. 2004.
 
21
P. Starzetz and W. Purczynski. Linux kernel setsockopt MCAST_MSFILTER integer overflow vulnerability, 2004. http://www.securityfocus.com/bid/10179.
 
22
23
 
24
I. L. Traiger. Trends in systems aspects of database management. In In Int'l Conf. on Databases, pages 1--21, 1983.
 
25
W. Weimer and G. Necula. Finding and preventing run-time error handling mistakes, 2004.
 
26
 
27

Collaborative Colleagues:
Andrew Lenharth: colleagues
Vikram S. Adve: colleagues
Samuel T. King: colleagues