| Skewed redundancy |
| Full text |
Pdf
(802 KB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: CMP architecture design
table of contents
Pages 62-71
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 65, Citation Count: 0
|
|
|
ABSTRACT
Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional hardware thread or core. While such techniques can provide high fault coverage, they at best provide equivalent performance to the original execution and at worst incur a slowdown due to error checking, contention for shared resources, and synchronization overheads. This work achieves a similar goal of detecting transient errors by redundantly executing a program on an additional processor core, however it speeds up (rather than slows down) program execution compared to the unprotected baseline case. It makes the observation that a small number of instructions are detrimental to overall performance, and selectively skipping them enables one core to advance far ahead of the other to obtain prefetching and large instruction window benefits. We highlight the modest incremental hardware required to support skewed redundancy and demonstrate a speedup of 6%/54% for a collection of integer/floating point benchmarks while still providing 100% error detection coverage within our sphere of replication. Additionally, we show that a third core can further improve performance while adding error recovery capabilities.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Nidhi Aggarwal , Parthasarathy Ranganathan , Norman P. Jouppi , James E. Smith, Configurable isolation: building high availability systems with commodity multi-core processors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
2
|
Ronald D. Barnes , Erik M. Nystrom , John W. Sias , Sanjay J. Patel , Nacho Navarro , Wen-mei W. Hwu, Beating in-order stalls with "flea-flicker" two-pass pipelining, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.387, December 03-05, 2003
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
H. Cain, K. Lepak, B. Schwarz, and M. Lipasti. Precise and accurate processor simulation. In CAECW, Feb. 2002.
|
| |
7
|
A. Cristal et al. Large virtual ROBs by processor checkpointing. Tech. Rep. UPC-DAC-2002-39, Univ. UPC, July 2002.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
 |
12
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
| |
13
|
P. Jordan, B. Konigsburg, H. Le, and S. White. US patent #5805849: Data processing system and method for using an unique identifier to maintain an age relationship between executing instructions, 1997.
|
| |
14
|
T. Karkhanis and J. Smith. A day in the life of a data cache miss, In Workshop on Memory Performance Issues, 2002.
|
| |
15
|
|
| |
16
|
|
 |
17
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
| |
18
|
Y. Ma, H. Gao, M. Dimitrov, and H. Zhou. Optimizing dual-core execution for power efficiency and transient-fault recovery. IEEE TPDS, 18(8):1080--1093, 2007.
|
| |
19
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
Vimal K. Reddy , Eric Rotenberg , Sailashri Parthasarathy, Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
 |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
Jared C. Smolens , Jangwoo Kim , James C. Hoe , Babak Falsafi, Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.257-268, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.19]
|
 |
29
|
|
 |
30
|
Srikanth T. Srinivasan , Ravi Rajwar , Haitham Akkary , Amit Gandhi , Mike Upton, Continual flow pipelines, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
31
|
|
 |
32
|
|
| |
33
|
|
|