ACM Home Page
Please provide us with feedback. Feedback
Rewind, repair, replay: three R's to dependability
Full text PdfPdf (146 KB)
Source ACM SIGOPS European Workshop archive
Proceedings of the 10th workshop on ACM SIGOPS European workshop table of contents
Saint-Emilion, France
SESSION: Robust service table of contents
Pages: 70 - 77  
Year of Publication: 2002
Authors
Aaron B. Brown  University of California at Berkeley, Berkeley, CA
David A. Patterson  University of California at Berkeley, Berkeley, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 13,   Citation Count: 4
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1133373.1133387
What is a DOI?

ABSTRACT

Motivated by the growth of web and infrastructure services and their susceptibility to human operator-related failures, we introduce system-level undo as a recovery mechanism designed to improve service dependability. Undo enables system operators to recover from their inevitable mistakes and furthermore enables retroactive repair of problems that were not fixed quickly enough to prevent detrimental effects. We present the "three R's", a model of undo that matches the needs of human error recovery and retroactive repair; discuss several of the issues raised by this undo model; and introduce an initial architectural framework for undoable systems using the example of an undoable e-mail service system.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
A. Brown and D. A. Patterson. To Err is Human. Proc. 2001 Workshop on Evaluating and Architecting System dependabilitY, Göteborg, Sweden, July 2001.
3
4
5
 
6
E. N. Elnozahy, D. B. Johnson, and Y. M. Wang. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. CMU TR 96--181, Carnegie Mellon, 1996.
 
7
P. Enriquez, A. Brown, and D. A. Patterson. Lessons from the PSTN for Dependable Computing. Proc. 2002 Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN), New York, June 2001.
8
 
9
 
10
 
11
D. Kurlander and S. Feiner. Editable Graphical Histories. Proc 1988 IEEE Workshop on Visual Languages, Pittsburgh, PA, October 1988.
 
12
D. E. Lowell, S. Chandra, and P. Chen. Exploring Failure Transparency and the Limits of Generic Recovery. Proc. 4th OSDI. San Diego, CA, October 2000.
13
 
14
D. Oppenheimer and D. A. Patterson. Why do Internet services fail, and what can be done about it? Proc. 10th ACM SIGOPS European Workshop. Saint-Emilion, France, September 2002.
 
15
J. Reason. Human Error. Cambridge University Press, 1990.
16
 
17
Roxio, Inc. GoBack3. http://www.roxio.com/en/products/goback/index.jhtml.

Collaborative Colleagues:
Aaron B. Brown: colleagues
David A. Patterson: colleagues