ACM Home Page
Please provide us with feedback. Feedback
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Full text PdfPdf (312 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 30th annual international symposium on Computer architecture table of contents
San Diego, California
SESSION: Recovery and replay table of contents
Pages: 122 - 135  
Year of Publication: 2003
ISBN:0-7695-1945-8
Also published in ...
Authors
Min Xu  Univ. of Wisconsin-Madison
Rastislav Bodik  Univ. of Wisconsin-Madison
Mark D. Hill  Univ. of Wisconsin-Madison
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 86,   Citation Count: 49
Additional Information:

abstract   references   cited by   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/859618.859633
What is a DOI?

ABSTRACT

Debuggers have been proven indispensable in improving software reliability. Unfortunately, on most real-life software, debuggers fail to deliver their most essential feature --- a faithful replay of the execution. The reason is non-determinism caused by multithreading and non-repeatable inputs. A common solution to faithful replay has been to record the non-deterministic execution. Existing recorders, however, either work only for datarace-free programs or have prohibitive overhead.As a step towards powerful debugging, we develop a practical low-overhead hardware recorder for cachecoherent multiprocessors, called Flight Data Recorder (FDR). Like an aircraft flight data recorder, FDR continuously records the execution, even on deployed systems, logging the execution for post-mortem analysis.FDR is practical because it piggybacks on the cache coherence hardware and logs nearly the minimal threadordering information necessary to faithfully replay the multiprocessor execution. Our studies, based on simulating a four-processor server with commercial workloads, show that when allocated less than 7% of system's physical memory, our FDR design can capture the last one second of the execution at modest (less than 2%) slowdown.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
4
5
 
6
Geodesic Systems. Geodesic TraceBack - Application Fault Management Monitor. Geodesic Systems, Inc., 2003.
 
7
D. Hunt and P. Marinos. A General Purpose Cache-Aided Rollback Error Recovery (CARER) Technique. In Proceedings of the 17th International Symposium on Fault-Tolerant Computing Systems, pages 170--175, 1987.
 
8
 
9
L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):690--691, Sept. 1979.
 
10
 
11
 
12
 
13
14
15
16
 
17
 
18
19
20
 
21
M. Ronsse and K. D. Bosschere. Non-intrusive On-the-fly Data Race Detection using Execution Replay. In Automated and Algorithmic Debugging, Nov. 2000.
22
 
23
C. E. Scheurich. Access Ordering and Coherence in Shared Memory Multiprocessors. Technical report, University of Southern California, Computer Engineering Division Technical Report No. CENG 89-19, May 1989.
24
 
25
D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Fast Checkpoint/Recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance. Technical Report 1420, Computer Sciences Department, University of Wisconsin--Madison, Oct. 2000.
26
 
27
R. Tremaine, P. Franaszek, J. Robinson, C. Schulz, T. Smith, M. Wazlowski, and P. Bland. IBM Memory Expansion Technology (MXT). IBM Journal of Research and Development, 45(2):271--285, Mar. 2001.
 
28
 
29
J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343, May 1977.

CITED BY  52
 
 
 
 
 
 
 
 
 
 
 
Collaborative Colleagues:
Min Xu: colleagues
Rastislav Bodik: colleagues
Mark D. Hill: colleagues

Peer to Peer - Readers of this Article have also read: