ACM Home Page
Please provide us with feedback. Feedback
First-aid: surviving and preventing memory management bugs during production runs
Full text PdfPdf (639 KB)
Source
European Conference on Computer Systems archive
Proceedings of the 4th ACM European conference on Computer systems table of contents
Nuremberg, Germany
SESSION: Real, running systems table of contents
Pages 159-172  
Year of Publication: 2009
ISBN:978-1-60558-482-9
Authors
Qi Gao  Ohio State University, Columbus, OH, USA
Wenbin Zhang  Ohio State University, Columbus, OH, USA
Yan Tang  Ohio State University, Columbus, OH, USA
Feng Qin  Ohio State University, Columbus, OH, USA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 119,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1519065.1519083
What is a DOI?

ABSTRACT

Memory bugs in C/C++ programs severely affect system availability and security. This paper presents First-Aid, a lightweight runtime system that survives software failures caused by common memory management bugs and prevents future failures by the same bugs during production runs. Upon a failure, First-Aid diagnoses the bug type and identifies the memory objects that trigger the bug. To do so, it rolls back the programto previous checkpoints and uses two types of environmental changes that can prevent or expose memory bug manifestation during re-execution. Based on the diagnosis, First-Aid generates and applies runtime patches to avoid the memory bug and prevent its reoccurrence. Furthermore, First-Aid validates the consistent effects of the runtime patches and generates on-site diagnostic reports to assist developers in fixing the bugs.

We have implemented First-Aid on Linux and evaluated it with seven applications that contain various types of memory bugs, including buffer overflow, uninitialized read, dangling pointer read/write, and double free. The results show that First-Aid can quickly diagnose the tested bugs and recover applications from failures (in 0.084 to 3.978 seconds). The results also show that the runtime patches generated by First-Aid can prevent future failures caused by the diagnosed bugs. Additionally, First-Aid provides detailed diagnostic information on both the root cause and the manifestation of the bugs. Furthermore, First-Aid incurs low overhead (0.4-11.6% with an average of 3.7%) during normal execution for the tested buggy applications, SPEC INT2000, and four allocation intensive programs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
 
5
A. Bobbio and M. Sereno. Fine grained software rejuvenation models. In Intl. Computer Performance and Dependability Symposium (ICPDS '98), pages 4--12, 1998.
6
 
7
 
8
 
9
10
 
11
 
12
S. Garg, A. Puliafito, M. Telek, and K. S. Trivedi. On the analysis of software rejuvenation policies. In Proceedings of the Annual Conference on Computer Assurance (CA'97), pages 88---96, 1997.
 
13
GNU. Gdb: The gnu project debugger.
 
14
J. Gray. Why do computers stop and what can be done about it? In Proceedings of Symposium on Reliable Distributed Systems (RDS' 86), pages 3--12, 1986.
 
15
W. Gu, Z. Kalbarczyk, R. K. Iyer, and Z. Yang. Characterization of Linux kernel behavior under errors. In Proceedings of Intl. Conf. on Dependable Systems and Networks (DSN'03), pages 459--468, Jun 2003.
 
16
R. Hasting and B. Joyce. Purify: Fast detection of memory leaks and access errors. In Proceedings of the USENIX Winter 1992 Technical Conference, pages 125--136, Dec 1992.
 
17
 
18
H. Jula, D. Tralamazza, C. Zamfir, and G. Candea. Deadlock immunity: Enabling systems to defend against deadlocks. In Proceedings of Symposium on Operating System Design and Implementation (OSDI'08), pages 295--308, Dec 2008.
 
19
 
20
D. Lea. A Memory Allocator, 1996.
 
21
D. E. Lowell and P. M. Chen. Discount checking: Transparent, low-overhead recovery for general applications. Technical report, CSE-TR-410-99, University of Michigan, 1998.
22
23
24
25
26
27
 
28
 
29
30
 
31
 
32
 
33
SPEC. http://www.spec.org/cpu2000.
 
34
 
35
M. Sullivan and R. Chillarege. Software defects and their impact on system availability -- A study of field failures in operating systems. In Proceedings of the Annual Intl. Symposium on Fault-Tolerant Computing (FTC'91), pages 2--9, Jun 1991.
 
36
Symantec. Internet security threat report. http://www.symantec.com/enterprise/threatreport/index.jsp, Sept 2006.
 
37
38
39
 
40
US-CERT. US-CERT vulnerability notes database. http://www.kb.cert.org/vuls.
41
42
 
43
 
44

Collaborative Colleagues:
Qi Gao: colleagues
Wenbin Zhang: colleagues
Yan Tang: colleagues
Feng Qin: colleagues