|
ABSTRACT
Transient faults due to particle strikes are a key challenge in microprocessor design. Driven by exponentially increasing transistor counts, per-chip faults are a growing burden. To protect against soft errors, redundancy techniques such as redundant multithreading (RMT) are often used. However, these techniques assume that the probability that a structural fault will result in a soft error (i.e., the Architectural Vulnerability Factor (AVF)) is 100 percent, unnecessarily draining processor resources. Due to the high cost of redundancy, there have been efforts to throttle RMT at runtime. To date, these methods have not incorporated an AVF model and therefore tend to be ad hoc. Unfortunately, computing the AVF of complex microprocessor structures (e.g., the ISQ) can be quite involved. To provide probabilistic guarantees about fault tolerance, we have created a rigorous characterization of AVF behavior that can be easily implemented in hardware. We experimentally demonstrate AVF variability within and across the SPEC2000 benchmarks and identify strong correlations between structural AVF values and a small set of processor metrics. Using these simple indicators as predictors, we create a proof-of-concept RMT implementation that demonstrates that AVF prediction can be used to maintain a low fault tolerance level without significant performance impact.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
David Bernick , Bill Bruckert , Paul Del Vigna , David Garcia , Robert Jardine , Jim Klecka , Jim Smullen, NonStop® Advanced Architecture, Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), p.12-21, June 28-July 01, 2005
[doi> 10.1109/DSN.2005.70]
|
 |
2
|
Arijit Biswas , Paul Racunas , Razvan Cheveresan , Joel Emer , Shubhendu S. Mukherjee , Ram Rangan, Computing Architectural Vulnerability Factors for Address-Based Structures, Proceedings of the 32nd annual international symposium on Computer Architecture, p.532-543, June 04-08, 2005
|
| |
3
|
D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.
|
| |
4
|
C.L. Chen and M.Y. Hsiao. Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review. IBM Journal of Research and Development, 28(2):124--134, March 1984.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Xin Fu , James Poe , Tao Li , Jose A. B. Fortes, Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior, Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation, p.147-155, September 11-14, 2006
[doi> 10.1109/MASCOTS.2006.18]
|
 |
8
|
|
 |
9
|
|
 |
10
|
Dirk Grunwald , Artur Klauser , Srilatha Manne , Andrew Pleszkun, Confidence estimation for speculation control, Proceedings of the 25th annual international symposium on Computer architecture, p.122-131, June 27-July 02, 1998, Barcelona, Spain
|
 |
11
|
Kenneth Hoste , Aashish Phansalkar , Lieven Eeckhout , Andy Georges , Lizy K. John , Koen De Bosschere, Performance prediction based on inherent program similarity, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
[doi> 10.1145/1152154.1152174]
|
| |
12
|
I. Jolliffe. Principal Component Analysis. Springer, 2002.
|
| |
13
|
S. Kumar and A. Aggarwal. Reduced Resource Redundancy for Concurrent Error Detection Techniques in High Performance Microprocessors. In Proceedings of the International Conference on High Performance Computer Architecture (HPCA), pages 212--221, February 2006.
|
| |
14
|
N. Madan and R. Balasubramonian. A First-Order Analysis of Power Overheads of Redundant Multi-Threading. In Proceedings of the Workshop on the System Effects of Logic Soft Errors (SELSE), April 2006.
|
 |
15
|
|
| |
16
|
|
| |
17
|
Multiple SimPoints. http://www.cse.ucsd.edu/~calder/simpoint/multiplestandardsimpoints.htm.
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
 |
21
|
Vimal K. Reddy , Eric Rotenberg , Sailashri Parthasarathy, Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
 |
22
|
|
| |
23
|
|
 |
24
|
George A. Reis , Jonathan Chang , Neil Vachharajani , Ram Rangan , David I. August , Shubhendu S. Mukherjee, Design and Evaluation of Hybrid Fault-Detection Systems, Proceedings of the 32nd annual international symposium on Computer Architecture, p.148-159, June 04-08, 2005
|
| |
25
|
|
| |
26
|
J. Sheaffer, D. Luebke, and K. Skadron. The visual vulnerability spectrum: Characterizing architectural vulnerability for graphics hardware. In Proceedings of the 2006 Graphics Hardware Workshop, 2006.
|
 |
27
|
|
| |
28
|
|
| |
29
|
Timothy J. Slegel , Robert M. Averill III , Mark A. Check , Bruce C. Giamei , Barry W. Krumm , Christopher A. Krygowski , Wen H. Li , John S. Liptay , John D. MacDougall , Thomas J. McPherson , Jennifer A. Navarro , Eric M. Schwarz , Kevin Shum , Charles F. Webb, IBM's S/390 G5 Microprocessor Design, IEEE Micro, v.19 n.2, p.12-23, March 1999
[doi> 10.1109/40.755464]
|
 |
30
|
Jared C. Smolens , Brian T. Gold , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, Fingerprinting: bounding soft-error detection latency and bandwidth, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
31
|
Jared C. Smolens , Jangwoo Kim , James C. Hoe , Babak Falsafi, Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.257-268, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.19]
|
| |
32
|
SPEC CPU2000. http://www.spec.org/cpu2000/.
|
 |
33
|
|
| |
34
|
A. Wood. Data integrity concepts, features, and technology. White Paper, Tandem Division, Compaq Computer Corporation.
|
| |
35
|
|
CITED BY 6
|
|
David Atienza , Giovanni De Micheli , Luca Benini , José L. Ayala , Pablo G. Del Valle , Michael DeBole , Vijay Narayanan, Reliability-aware design for nanometer-scale devices, Proceedings of the 2008 conference on Asia and South Pacific design automation, January 21-24, 2008, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|