| Fingerprinting: bounding soft-error detection latency and bandwidth |
| Full text |
Pdf
(230 KB)
|
| Source
|
Architectural Support for Programming Languages and Operating Systems
archive
Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
table of contents
Boston, MA, USA
SESSION: Reliability
table of contents
Pages: 224 - 234
Year of Publication: 2004
ISBN:1-58113-804-0
Also published in ...
|
|
Authors
|
|
Jared C. Smolens
|
Carnegie Mellon University, Pittsburgh, PA
|
|
Brian T. Gold
|
Carnegie Mellon University, Pittsburgh, PA
|
|
Jangwoo Kim
|
Carnegie Mellon University, Pittsburgh, PA
|
|
Babak Falsafi
|
Carnegie Mellon University, Pittsburgh, PA
|
|
James C. Hoe
|
Carnegie Mellon University, Pittsburgh, PA
|
|
Andreas G. Nowatzyk
|
Carnegie Mellon University, Pittsburgh, PA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 86, Citation Count: 11
|
|
|
ABSTRACT
Recent studies have suggested that the soft-error rate in microprocessor logic will become a reliability concern by 2010. This paper proposes an efficient error detection technique, called fingerprinting, that detects differences in execution across a dual modular redundant (DMR) processor pair. Fingerprinting summarizes a processor's execution history in a hash-based signature; differences between two mirrored processors are exposed by comparing their fingerprints. Fingerprinting tightly bounds detection latency and greatly reduces the interprocessor communication bandwidth required for checking. This paper presents a study that evaluates fingerprinting against a range of current approaches to error detection. The result of this study shows that fingerprinting is the only error detection mechanism that simultaneously allows high-error coverage, low error detection bandwidth, and high I/O performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
D. Bossen, J. Tendler, and K. Reick. Power4 system design for high reliability. In Hot Chips-13, August 2001.
|
| |
5
|
D. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin--Madison, June 1997.
|
| |
6
|
E. N. Elnozahy, D. B. Johnson, and Y. M. Wang. A survey of rollback-recovery protocols in message-passing systems. Technical report, CMU-CS-96-181, Department of Computer Science, Carnegie Mellon University, Sept 1996.
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
T. Juhnke and H. Klar. Calculation of the soft error rate of submicron cmos logic circuits. IEEE Journal of Solid State Circuits, 30(7):830--834, July 1995.
|
| |
13
|
G. A. Kanawati, N. A. Kanawati, and J. A. Abraham. FERRARI: a tool for the valiadation of system dependability properties. In Proceedings of the 22nd International Symposium on Fault Tolerant Computing, 1992.
|
| |
14
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
L. Sherman. Stratus continuous processing technology -- the smarter approach to uptime. Technical report, Stratus Technologies, 2003.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Timothy J. Slegel , Robert M. Averill III , Mark A. Check , Bruce C. Giamei , Barry W. Krumm , Christopher A. Krygowski , Wen H. Li , John S. Liptay , John D. MacDougall , Thomas J. McPherson , Jennifer A. Navarro , Eric M. Schwarz , Kevin Shum , Charles F. Webb, IBM's S/390 G5 Microprocessor Design, IEEE Micro, v.19 n.2, p.12-23, March 1999
[doi> 10.1109/40.755464]
|
| |
26
|
|
 |
27
|
Daniel J. Sorin , Milo M. K. Martin , Mark D. Hill , David A. Wood, SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery, Proceedings of the 29th annual international symposium on Computer architecture, p.123, May 25-29, 2002, Anchorage, Alaska
|
| |
28
|
Standard Performance Evaluation Corporation. SPECweb99 benchmark. http://www.specbench.org/osg/web99/.
|
 |
29
|
|
| |
30
|
The Transaction Processing Performance Council. TPC Benchmark C: Standard specification. http://www.tpc.org/tpcc/spec/tpcc_current.pdf, Dec 2003.
|
| |
31
|
|
 |
32
|
|
| |
33
|
N. Wang and S. Patel. Modeling the effect of transient errors on high performance microprocessors. In Center for Circuits, Systems, and Software (C2S2), 2nd Annual Review, March 2003.
|
| |
34
|
K. Wilken and J. P. Shen. Continuous signature monitoring: Low-cost concurrent dectection of processor control errors. IEEE Transactions on Computer-Aided Design, 9(6):629--641, June 1990.
|
| |
35
|
J. K. Wolf, A. M. Michelson, and A. H. Levesque. On the probability of undetected error for linear block codes. IEEE Transactions on Communications, 30(2), Feb 1982.
|
| |
36
|
J. F. Ziegler , H. P. Muhlfeld , C. J. Montrose , H. W. Curtis , T. J. O'Gorman , J. M. Ross, Accelerated testing for cosmic soft-error rate, IBM Journal of Research and Development, v.40 n.1, p.51-72, Jan. 1996
|
CITED BY 11
|
|
Milo M. K. Martin , Daniel J. Sorin , Bradford M. Beckmann , Michael R. Marty , Min Xu , Alaa R. Alameldeen , Kevin E. Moore , Mark D. Hill , David A. Wood, Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
|
|
|
|
|
|
|
|
|
|
|
|
Kypros Constantinides , Stephen Plaza , Jason Blome , Valeria Bertacco , Scott Mahlke , Todd Austin , Bin Zhang , Michael Orshansky, Architecting a reliable CMP switch architecture, ACM Transactions on Architecture and Code Optimization (TACO), v.4 n.1, p.2-es, March 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|