ACM Home Page
Please provide us with feedback. Feedback
Skewed redundancy
Full text PdfPdf (802 KB)
Source
PACT archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques table of contents
Toronto, Ontario, Canada
SESSION: CMP architecture design table of contents
Pages 62-71  
Year of Publication: 2008
ISBN:978-1-60558-282-5
Authors
Gordon B. Bell  IBM, Research Triangle Park, NC, USA
Mikko H. Lipasti  University of Wisconsin - Madison, Madison, WI, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 65,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1454115.1454126
What is a DOI?

ABSTRACT

Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional hardware thread or core. While such techniques can provide high fault coverage, they at best provide equivalent performance to the original execution and at worst incur a slowdown due to error checking, contention for shared resources, and synchronization overheads. This work achieves a similar goal of detecting transient errors by redundantly executing a program on an additional processor core, however it speeds up (rather than slows down) program execution compared to the unprotected baseline case. It makes the observation that a small number of instructions are detrimental to overall performance, and selectively skipping them enables one core to advance far ahead of the other to obtain prefetching and large instruction window benefits. We highlight the modest incremental hardware required to support skewed redundancy and demonstrate a speedup of 6%/54% for a collection of integer/floating point benchmarks while still providing 100% error detection coverage within our sphere of replication. Additionally, we show that a third core can further improve performance while adding error recovery capabilities.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
 
6
H. Cain, K. Lepak, B. Schwarz, and M. Lipasti. Precise and accurate processor simulation. In CAECW, Feb. 2002.
 
7
A. Cristal et al. Large virtual ROBs by processor checkpointing. Tech. Rep. UPC-DAC-2002-39, Univ. UPC, July 2002.
 
8
 
9
 
10
11
12
 
13
P. Jordan, B. Konigsburg, H. Le, and S. White. US patent #5805849: Data processing system and method for using an unique identifier to maintain an age relationship between executing instructions, 1997.
 
14
T. Karkhanis and J. Smith. A day in the life of a data cache miss, In Workshop on Memory Performance Issues, 2002.
 
15
 
16
17
 
18
Y. Ma, H. Gao, M. Dimitrov, and H. Zhou. Optimizing dual-core execution for power efficiency and transient-fault recovery. IEEE TPDS, 18(8):1080--1093, 2007.
 
19
20
 
21
 
22
23
24
 
25
 
26
 
27
 
28
29
30
 
31
32
 
33

Collaborative Colleagues:
Gordon B. Bell: colleagues
Mikko H. Lipasti: colleagues