|
ABSTRACT
Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and base objects.We identify two classes of object failures: responsive and nonresponsive. With responsive failures, a faulty object responds to every operation, but its responses may be incorrect. With nonresponsive failures, a faulty object may also “hang” without responding. In each class, we define crash, omission, and arbitrary modes of failure.We show that all responsive failure modes can be tolerated. More precisely, for all responsive failure modes F, object types T, and t &ohgr; 0, we show how to implement a shared object of type T which is t-tolerant for F. Such an object remains correct and wait-free even if up to t base objects fail according to F. In contrast to responsive failures, we show that even the most benign non-responsive failure mode cannot be tolerated. We also show that randomization can be used to circumvent this impossibility result.Graceful degradation is a desirable property of fault-tolerant implementations: the implemented object never fails more severely than the base objects it is derived from, even if all the base objects fail. For several failure modes, we show wheter this property can be achieved, and, if so, how.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Yehuda Afek , David S. Greenberg , Michael Merritt , Gadi Taubenfeld, Computing with faulty shared memory, Proceedings of the eleventh annual ACM symposium on Principles of distributed computing, p.47-58, August 10-12, 1992, Vancouver, British Columbia, Canada
[doi> 10.1145/135419.135431]
|
 |
2
|
|
 |
3
|
|
| |
4
|
BERMAN, P., GARAY, J., AND PERRY, K. J. 1989. Towards optimal distributed consensus. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE, New York, pp. 410-415.
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
HALDAR, S., AND VIDYASANKAR, K. 1991. A simple construction of 1-writer multi-reader multivalued atomic variable from regular variables. Tech Rep. No. 9108. Department of Computer Science, Memorial University of Newfoundland, St. John's, NF, Canada.
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
LAMPORT, L. 1978. The implementation of reliable distributed multi-process systems. Comput. Netw. 2, 95-114.
|
| |
24
|
LAMPORT, L. 1986.On interprocess communication, Parts I and II. Distr. Comput. 1, 77-101.
|
 |
25
|
|
| |
26
|
LouI, M. C., AND ABU-AMARA. 1987. Memory requirements for agreement among unreliable asynchonous processes. Adv. Comput. Res. 4, 163-183.
|
| |
27
|
LYNCH, N., AND TUTTLE, M. 1988. An introduction to input/output automata. Tech. Rep. MIT/ LCS/TM-373. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
|
 |
28
|
Richard Newman-Wolfe, A protocol for wait-free, atomic, multi-reader shared variables, Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, p.232-248, August 10-12, 1987, Vancouver, British Columbia, Canada
[doi> 10.1145/41840.41860]
|
 |
29
|
|
 |
30
|
|
| |
31
|
PETERSON, G., AND BURNS, J. 1987. Concurrent reading while writing II: The multi-writer case. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science. IEEE, New York.
|
 |
32
|
|
| |
33
|
SCHAFFER, R. 1988. On the correctness of atomic multi-writer registers. Tech. Rep. TR No.: MIT/LCS/TM-364. Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
|
 |
34
|
|
 |
35
|
Ambuj K. Singh , James H. Anderson , Mohamed G. Gouda, The elusive atomic register revisited, Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, p.206-221, August 10-12, 1987, Vancouver, British Columbia, Canada
[doi> 10.1145/41840.41858]
|
| |
36
|
SRIKANTH, T. K., AND TOUEG, S. 1987. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Dist. Comput. 2, 2, 80-94.
|
| |
37
|
|
| |
38
|
|
| |
39
|
VITANYI, P., AND AWERBUCH, B. 1986. Atomic shared register access by asynchonous hardware. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science. IEEE, New York.
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ittai Abraham , Gregory V. Chockler , Idit Keidar , Dahlia Malkhi, Byzantine disk paxos: optimal resilience with byzantine shared memory, Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, July 25-28, 2004, St. John's, Newfoundland, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|