|
ABSTRACT
Faults that occur in production systems are the most important faults to fix, but most production systems lack the debugging facilities present in development environments. TraceBack provides debugging information for production systems by providing execution history data about program problems (such as crashes, hangs, and exceptions). TraceBack supports features commonly found in production environments such as multiple threads, dynamically loaded modules, multiple source languages (e.g., Java applications running with JNI modules written in C++), and distributed execution across multiple computers. TraceBack supports first fault diagnosis-discovering what went wrong the first time a fault is encountered. The user can see how the program reached the fault state without having to re-run the computation; in effect enabling a limited form of a debugger in production code.TraceBack uses static, binary program analysis to inject low-overhead runtime instrumentation at control-flow block granularity. Post-facto reconstruction of the records written by the instrumentation code produces a source-statement trace for user diagnosis. The trace shows the dynamic instruction sequence leading up to the fault state, even when the program took exceptions or terminated abruptly (e.g., kill -9).We have implemented TraceBack on a variety of architectures and operating systems, and present examples from a variety of platforms. Performance overhead is variable, from 5% for Apache running SPECweb99, to 16%-25% for the Java SPECJbb benchmark, to 60% average for SPECint2000. We show examples of TraceBack's cross-language and cross-machine abilities, and report its use in diagnosing problems in production software.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Alex Aiken , Jeffrey S. Foster , John Kodumal , Tachio Terauchi, Checking and inferring local non-aliasing, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
 |
2
|
Marcos K. Aguilera , Jeffrey C. Mogul , Janet L. Wiener , Patrick Reynolds , Athicha Muthitacharoen, Performance debugging for distributed systems of black boxes, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
 |
3
|
Vasanth Bala , Evelyn Duesterwald , Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.1-12, June 18-21, 2000, Vancouver, British Columbia, Canada
|
| |
4
|
|
| |
5
|
|
| |
6
|
Mike Y. Chen , Emre Kiciman , Eugene Fratkin , Armando Fox , Eric Brewer, Pinpoint: Problem Determination in Large, Dynamic Internet Services, Proceedings of the 2002 International Conference on Dependable Systems and Networks, p.595-604, June 23-26, 2002
|
| |
7
|
Chernoff, A., and Hookway, R. DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT. In Proceedings of the USENIX Windows NT Workshop (Seattle, WA, Aug. 1997).
|
| |
8
|
|
 |
9
|
|
 |
10
|
Jeremy Condit , Matthew Harren , Scott McPeak , George C. Necula , Westley Weimer, CCured in the real world, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
| |
11
|
Cook, J. C. Reverse execution of Java Bytecode. In The Computer Journal,. Vol 45 (6), 2002.
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
Ben Liblit , Alex Aiken , Alice X. Zheng , Michael I. Jordan, Bug isolation via remote program sampling, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
| |
20
|
Microsoft Corporation. Phoenix compiler infrastructure. http://research.microsoft.com/phoenix April 2005.
|
| |
21
|
Nandy, S., Xiaofeng, G., and Ferrante, J. TFP: Time-sensitive, Flow-specific Profiling at Runtime. In Workshop on Languages and Compilers for Parallel Computing, 2003.
|
| |
22
|
National Software Testing Laboratories. NSTL Final Report for Rational Software: Performance test of Rational Software's software product Purify. October 1997. http://www.rational.com/media/whitepapers/pnt-nstl.pdf
|
| |
23
|
Nethercote, N. Dynamic Binary Analysis and Instrumentation. Ph.D. dissertation, University of Cambridge, 2004.
|
| |
24
|
Romer, T., Voelker, G., Lee, D., Wolman, A., Wong, W., Levy, H., Bershad, B., and Chen, J. Instrumentation and Optimization of Win32/Intel Executables Using Etch. In Proceedings of the USENIX Windows NT Workshop,1997.
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
Virtutech Corporation. Hindsight. http://www.virtutech.com March, 2005.
|
 |
30
|
|
CITED BY 6
|
|
Sanjay Bhansali , Wen-Ke Chen , Stuart de Jong , Andrew Edwards , Ron Murray , Milenko Drinić , Darek Mihočka , Joe Chau, Framework for instruction-level tracing and analysis of program executions, Proceedings of the second international conference on Virtual execution environments, June 14-16, 2006, Ottawa, Ontario, Canada
|
|
|
|
|
|
Jungwoo Ha , Christopher J. Rossbach , Jason V. Davis , Indrajit Roy , Hany E. Ramadan , Donald E. Porter , David L. Chen , Emmett Witchel, Improved error reporting for software that uses black-box components, ACM SIGPLAN Notices, v.42 n.6, June 2007
|
|
|
|
|
|
|
|
|
Anupam Chanda , Khaled Elmeleegy , Alan L. Cox , Willy Zwaenepoel, Causeway: support for controlling and analyzing the execution of multi-tier applications, Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware, p.42-59, November 01-01, 2005, Grenoble, France
|
|