ACM Home Page
Please provide us with feedback. Feedback
Fingerprinting: bounding soft-error detection latency and bandwidth
Full text PdfPdf (230 KB)
Source Architectural Support for Programming Languages and Operating Systems archive
Proceedings of the 11th international conference on Architectural support for programming languages and operating systems table of contents
Boston, MA, USA
SESSION: Reliability table of contents
Pages: 224 - 234  
Year of Publication: 2004
ISBN:1-58113-804-0
Also published in ...
Authors
Jared C. Smolens  Carnegie Mellon University, Pittsburgh, PA
Brian T. Gold  Carnegie Mellon University, Pittsburgh, PA
Jangwoo Kim  Carnegie Mellon University, Pittsburgh, PA
Babak Falsafi  Carnegie Mellon University, Pittsburgh, PA
James C. Hoe  Carnegie Mellon University, Pittsburgh, PA
Andreas G. Nowatzyk  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
SIGOPS: ACM Special Interest Group on Operating Systems
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 86,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1024393.1024420
What is a DOI?

ABSTRACT

Recent studies have suggested that the soft-error rate in microprocessor logic will become a reliability concern by 2010. This paper proposes an efficient error detection technique, called fingerprinting, that detects differences in execution across a dual modular redundant (DMR) processor pair. Fingerprinting summarizes a processor's execution history in a hash-based signature; differences between two mirrored processors are exposed by comparing their fingerprints. Fingerprinting tightly bounds detection latency and greatly reduces the interprocessor communication bandwidth required for checking. This paper presents a study that evaluates fingerprinting against a range of current approaches to error detection. The result of this study shows that fingerprinting is the only error detection mechanism that simultaneously allows high-error coverage, low error detection bandwidth, and high I/O performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
D. Bossen, J. Tendler, and K. Reick. Power4 system design for high reliability. In Hot Chips-13, August 2001.
 
5
D. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin--Madison, June 1997.
 
6
E. N. Elnozahy, D. B. Johnson, and Y. M. Wang. A survey of rollback-recovery protocols in message-passing systems. Technical report, CMU-CS-96-181, Department of Computer Science, Carnegie Mellon University, Sept 1996.
 
7
8
 
9
 
10
 
11
 
12
T. Juhnke and H. Klar. Calculation of the soft error rate of submicron cmos logic circuits. IEEE Journal of Solid State Circuits, 30(7):830--834, July 1995.
 
13
G. A. Kanawati, N. A. Kanawati, and J. A. Abraham. FERRARI: a tool for the valiadation of system dependability properties. In Proceedings of the 22nd International Symposium on Fault Tolerant Computing, 1992.
 
14
15
16
 
17
18
19
 
20
 
21
L. Sherman. Stratus continuous processing technology -- the smarter approach to uptime. Technical report, Stratus Technologies, 2003.
22
 
23
 
24
 
25
 
26
27
 
28
Standard Performance Evaluation Corporation. SPECweb99 benchmark. http://www.specbench.org/osg/web99/.
29
 
30
The Transaction Processing Performance Council. TPC Benchmark C: Standard specification. http://www.tpc.org/tpcc/spec/tpcc_current.pdf, Dec 2003.
 
31
32
 
33
N. Wang and S. Patel. Modeling the effect of transient errors on high performance microprocessors. In Center for Circuits, Systems, and Software (C2S2), 2nd Annual Review, March 2003.
 
34
K. Wilken and J. P. Shen. Continuous signature monitoring: Low-cost concurrent dectection of processor control errors. IEEE Transactions on Computer-Aided Design, 9(6):629--641, June 1990.
 
35
J. K. Wolf, A. M. Michelson, and A. H. Levesque. On the probability of undetected error for linear block codes. IEEE Transactions on Communications, 30(2), Feb 1982.
 
36

CITED BY  11

Collaborative Colleagues:
Jared C. Smolens: colleagues
Brian T. Gold: colleagues
Jangwoo Kim: colleagues
Babak Falsafi: colleagues
James C. Hoe: colleagues
Andreas G. Nowatzyk: colleagues