ACM Home Page
Please provide us with feedback. Feedback
Fault-tolerant typed assembly language
Full text PdfPdf (221 KB)
Source
ACM SIGPLAN Notices archive
Volume 42 ,  Issue 6  (June 2007) table of contents
Proceedings of the 2007 PLDI conference
SESSION: Compiled correctly table of contents
Pages: 42 - 53  
Year of Publication: 2007
ISSN:0362-1340
Also published in ...
Authors
Frances Perry  Princeton University, Princeton, NJ
Lester Mackey  Princeton University, Princeton, NJ
George A. Reis  Princeton University, Princeton, NJ
Jay Ligatti  University of South Florida, Tampa, FL
David I. August  Princeton University, Princeton, NJ
David Walker  Princeton University, Princeton, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 96,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1273442.1250741
What is a DOI?

ABSTRACT

A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. Although transient faults do not permanently damage the hardware, they may corrupt computations by altering stored values and signal transfers. In this paper, we propose a new scheme for provably safe and reliable computing in the presence of transient hardware faults. In our scheme, software computations are replicated to provide redundancy while special instructions compare the independently computed results to detect errors before writing critical data. In stark contrast to any previous efforts in this area, we have analyzed our fault tolerance scheme from a formal, theoretical perspective. To be specific, first, we provide an operational semantics for our assembly language, which includes a precise formal definition of our fault model. Second, we develop an assembly-level type system designed to detect reliability problems in compiled code. Third, we provide a formal specification for program fault tolerance under the given fault model and prove that all well-typed programs are indeed fault tolerant. In addition to the formal analysis, we evaluate our detection scheme and show that it only takes 34% longer to execute than the unreliable version.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. C. Baumann. Soft errors in advanced semiconductor devices-part I: the three radiation sources. IEEE Transactions on Device and Materials Reliability, 1(1):17--22, March 2001.
 
2
R. C. Baumann. Soft errors in commercial semiconductor technology: Overview and scaling trends. In IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals, pages 121 01.1--121 01.14, April 2002.
 
3
4
5
 
6
 
7
S. E. Michalak, K. W. Harris, N. W. Hengartner, B. E. Takala, and S. A. Wender. Predicting the number of fatal soft errors in Los Alamos National Labratory's ASC Q computer. IEEE Transactions on Device and Materials Reliability, 5(3):329--335, September 2005.
8
9
 
10
 
11
 
12
N. Oh, P. P. Shirvani, and E. J. McCluskey. Control-flow checking by software signatures. In IEEE Transactions on Reliability, volume 51, pages 111--122, March 2002.
 
13
N. Oh, P. P. Shirvani, and E. J. McCluskey. Error detection by duplicated instructions in super-scalar processors. In IEEE Transactions on Reliability, volume 51, pages 63--75, March 2002.
 
14
 
15
F. Perry, L.Mackey, G. A. Reis, J. Ligatti, D. I. August, and D.Walker. Fault-tolerant typed assembly language. Technical Report TR--776--07, Princeton University, 2007.
16
 
17
 
18
19
 
20
P. P. Shirvani, N. Saxena, and E. J. McCluskey. Softwareimplemented EDAC protection against SEUs. In IEEE Transactions on Reliability, volume 49, pages 273--284, 2000.
 
21
 
22
23
 
24
R. Venkatasubramanian, J. P. Hayes, and B. T. Murray. Low-cost on-line fault detection using control flow assertions. In Proceedings of the 9th IEEE International On-Line Testing Symposium, pages 137--143, July 2003.
25
26
 
27
Y. Yeh. Triple-triple redundant 777 primary flight computer. In Proceedings of the 1996 IEEE Aerospace Applications Conference, volume 1, pages 293--307, February 1996.
 
28
J. F. Ziegler and H. Puchner. SER-History, Trends, and Challenges: A Guide for Designing with Memory ICs. 2004.


Collaborative Colleagues:
Frances Perry: colleagues
Lester Mackey: colleagues
George A. Reis: colleagues
Jay Ligatti: colleagues
David I. August: colleagues
David Walker: colleagues