ACM Home Page
Please provide us with feedback. Feedback
TraceBack: first fault diagnosis by reconstruction of distributed control flow
Full text PdfPdf (348 KB)
Source Conference on Programming Language Design and Implementation archive
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation table of contents
Chicago, IL, USA
SESSION: Instrumentation and testing table of contents
Pages: 201 - 212  
Year of Publication: 2005
ISBN:1-59593-056-6
Also published in ...
Authors
Andrew Ayers  Microsoft Corporation
Richard Schooler  Microsoft Corporation
Chris Metcalf  VERITAS Software / MIT CSAIL
Anant Agarwal  VERITAS Software / MIT CSAIL
Junghwan Rhee  University of Texas at Austin
Emmett Witchel  University of Texas at Austin
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 57,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1065010.1065035
What is a DOI?

ABSTRACT

Faults that occur in production systems are the most important faults to fix, but most production systems lack the debugging facilities present in development environments. TraceBack provides debugging information for production systems by providing execution history data about program problems (such as crashes, hangs, and exceptions). TraceBack supports features commonly found in production environments such as multiple threads, dynamically loaded modules, multiple source languages (e.g., Java applications running with JNI modules written in C++), and distributed execution across multiple computers. TraceBack supports first fault diagnosis-discovering what went wrong the first time a fault is encountered. The user can see how the program reached the fault state without having to re-run the computation; in effect enabling a limited form of a debugger in production code.TraceBack uses static, binary program analysis to inject low-overhead runtime instrumentation at control-flow block granularity. Post-facto reconstruction of the records written by the instrumentation code produces a source-statement trace for user diagnosis. The trace shows the dynamic instruction sequence leading up to the fault state, even when the program took exceptions or terminated abruptly (e.g., kill -9).We have implemented TraceBack on a variety of architectures and operating systems, and present examples from a variety of platforms. Performance overhead is variable, from 5% for Apache running SPECweb99, to 16%-25% for the Java SPECJbb benchmark, to 60% average for SPECint2000. We show examples of TraceBack's cross-language and cross-machine abilities, and report its use in diagnosing problems in production software.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
 
6
 
7
Chernoff, A., and Hookway, R. DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT. In Proceedings of the USENIX Windows NT Workshop (Seattle, WA, Aug. 1997).
 
8
9
10
 
11
Cook, J. C. Reverse execution of Java Bytecode. In The Computer Journal,. Vol 45 (6), 2002.
12
13
14
 
15
 
16
17
18
19
 
20
Microsoft Corporation. Phoenix compiler infrastructure. http://research.microsoft.com/phoenix April 2005.
 
21
Nandy, S., Xiaofeng, G., and Ferrante, J. TFP: Time-sensitive, Flow-specific Profiling at Runtime. In Workshop on Languages and Compilers for Parallel Computing, 2003.
 
22
National Software Testing Laboratories. NSTL Final Report for Rational Software: Performance test of Rational Software's software product Purify. October 1997. http://www.rational.com/media/whitepapers/pnt-nstl.pdf
 
23
Nethercote, N. Dynamic Binary Analysis and Instrumentation. Ph.D. dissertation, University of Cambridge, 2004.
 
24
Romer, T., Voelker, G., Lee, D., Wolman, A., Wong, W., Levy, H., Bershad, B., and Chen, J. Instrumentation and Optimization of Win32/Intel Executables Using Etch. In Proceedings of the USENIX Windows NT Workshop,1997.
25
26
27
 
28
 
29
Virtutech Corporation. Hindsight. http://www.virtutech.com March, 2005.
30


Collaborative Colleagues:
Andrew Ayers: colleagues
Richard Schooler: colleagues
Chris Metcalf: colleagues
Anant Agarwal: colleagues
Junghwan Rhee: colleagues
Emmett Witchel: colleagues