| A light-weight cache-based fault detection and checkpointing scheme for MPSoCs enabling relaxed execution synchronization |
| Full text |
Pdf
(443 KB)
|
Source
|
International Conference on Compilers, Architecture and Synthesis for Embedded Systems
archive
Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
table of contents
Atlanta, GA, USA
SESSION: Resiliency
table of contents
Pages 11-20
Year of Publication: 2008
ISBN:978-1-60558-469-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 72, Citation Count: 0
|
|
|
ABSTRACT
While technology advances have made MPSoCs a standard architecture for embedded systems, their applicability is increasingly being challenged by dramatic increases in the amount of device failures that may occur during execution. Conventional fault tolerance techniques employ a duplication-and-comparison strategy to detect arbitrary execution faults, as well as a checkpointing-and-rollback strategy to recover from the faulty state. Comparison and checkpointing are performed either at task level, thus imposing a large amount of overhead in verifying and backing up memory pages, or at instruction level, thus necessitating a lock-step execution model which significantly limits the attainable performance. To overcome the shortcomings of both strategies, in this paper we propose a cache-based fault tolerance scheme wherein the comparison and checkpointing process is performed at the cache-memory interface. By allowing two processors that execute duplicated tasks to share a single data cache, the proposed scheme is able to verify execution results before writing them back into memory, thus protecting the memory from being polluted by execution faults. This in turn significantly reduces the checkpointing overhead. Meanwhile, since only the data written into memory are compared, the strict instruction-by-instruction synchronization model used in multithreading processors can be relaxed. The simulation results confirm that the proposed scheme only imposes a performance overhead ranging from 1.4% to 10.4%, while both fault detection and execution checkpointing can be effectively attained.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
International Technology Roadmap for Semiconductors (ITRS), 2007 Edition. "Process integration, devices, and structures".
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
A. Wood, "Data integrity concepts, features, and technology," White paper, Tandem divison, Compaq Computer Corporation.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
D. B. Hunt and P. N. Marinos, "A general purpose cache-aided rollback error recovery (CARER) technique," In Proc. FTCS-17, pp. 170--175, 1987.
|
| |
14
|
|
| |
15
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
16
|
|
|