|
ABSTRACT
This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of reliability by tolerating hardware faults at all levels of the system. Historically, such hardware fault tolerance has only been achievable using custom-designed hardware and proprietary operating systems. Today, however, technological trends and economic factors are driving a reduction in the amount of custom-designed hardware. We believe that this path should be followed to its ultimate conclusion: a highly-available, fault-tolerant computing system based entirely on commodity hardware and open-source operating systems. Our revolutionary approach utilizes virtualization to efficiently provide redundancy on modern commodity hardware. When combined with existing application-level fault tolerance mechanisms, LSRVM will provide very high levels of reliability at extremely low cost.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Shekhar Borkar , Tanay Karnik , Siva Narendra , Jim Tschanz , Ali Keshavarzi , Vivek De, Parameter variations and impact on circuits and microarchitecture, Proceedings of the 40th conference on Design automation, June 02-06, 2003, Anaheim, CA, USA
[doi> 10.1145/775832.775920]
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
J. Bartlett, J. Gray, and B. Horst, "Fault tolerance in Tandem computer systems," Technical report 86.2, Tandem Computers, March 1986.
|
| |
10
|
David Bernick , Bill Bruckert , Paul Del Vigna , David Garcia , Robert Jardine , Jim Klecka , Jim Smullen, NonStop® Advanced Architecture, Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN'05), p.12-21, June 28-July 01, 2005
[doi> 10.1109/DSN.2005.70]
|
 |
11
|
Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
12
|
Intel, Intel Virtualization Technology Specification for the Intel Itanium Architecture (VT-i), April 2005. Revision 2.0.
|
| |
13
|
Advanced Micro Devices, Secure Virtual Machine Architecture Reference Manual, May 2005. Revision 3.01.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
M. L. Bushnell and V. D. Agrawal, eds., Essentials of electronic testing for digital, memory and mixed-signal VLSI circuits. MA, USA: Kluwer Academic Publishers, 2000.
|
| |
21
|
J. H. Wensley et al., "SIFT: Design and analysis of a fault-tolerant computer for aircraft control," vol. 66, pp. 1240--1255, Oct. 1978.
|
| |
22
|
|
| |
23
|
J. L. Gersting et al., "A comparison of voting algorithms for n-version programming," in Intl. Conference on System Sciences, pp. 253--262, 1991.
|
| |
24
|
J. M. Bass, G. Latif-Shabgahi, and S. Bennett, "History-based weighted average voter: A novel software voting algorithm for fault-tolerant computer systems," in Euromicro Conference, pp. 402--409, 2001.
|
| |
25
|
|
| |
26
|
IEEE Standard 729-1982, IEEE Glossary of Software Engineering Terminology. IEEE, 1982.
|
|