|
ABSTRACT
Emerging Web services, such as email, photo sharing, and web site archives, must preserve large volumes of quickly accessible data indefinitely into the future. The costs of doing so often determine whether the service is economically viable. We make the case that these applications' demands on large scale storage systems over long time horizons require us to reevaluate traditional system designs. We examine threats to long-lived data from an end-to-end perspective, taking into account not just hardware and software faults but also faults due to humans and organizations. We present a simple model of long-term storage failures that helps us reason about various strategies for addressing some of these threats. Using this model we show that the most important strategies for increasing the reliability of long-term storage are detecting latent faults quickly, automating fault repair to make it cheaper and faster, and increasing the independence of data replicas.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
104th Congress, United States of America. Public Law 104--191: Health Insurance Portability and Accountability Act (HIPAA), Aug. 1996.
|
| |
2
|
107th Congress, United States of America. Public Law 107--204: Sarbanes-Oxley Act of 2002, July 2002.
|
| |
3
|
D. Akst. Postcard from Cyberspace. Los Angeles Times, Jan. 1995.
|
| |
4
|
AMIA2003. Fact Sheet 5 - Estimating Tape Life. http://www.amianet.org/publication/resources/guidelines/videofacts/tapelife.html, 2003.
|
| |
5
|
|
| |
6
|
R. J. Anderson. The Eternity Service. In 1st Intl. Conf. on the Theory and Applications of Cryptology, 1996.
|
| |
7
|
ARL - Association of Research Libraries. ARL Statistics 2000--01. http://www.arl.org/stats/arlstat/01pub/intro.html, 2001.
|
| |
8
|
M. Baker, K. Keeton, and S. Martin. Why Traditional Storage Systems Don't Help Us Save Stuff Forever. In Proc. 1st IEEE Workshop on Hot Topics in System Dependability, 2005.
|
| |
9
|
|
 |
10
|
Peter M. Chen , Edward K. Lee , Garth A. Gibson , Randy H. Katz , David A. Patterson, RAID: high-performance, reliable secondary storage, ACM Computing Surveys (CSUR), v.26 n.2, p.145-185, June 1994
[doi> 10.1145/176979.176981]
|
 |
11
|
Yuan Chen , Jan Edler , Andrew Goldberg , Allan Gottlieb , Sumeet Sobti , Peter Yianilos, A prototype implementation of archival Intermemory, Proceedings of the fourth ACM conference on Digital libraries, p.28-37, August 11-14, 1999, Berkeley, California, United States
[doi> 10.1145/313238.313249]
|
| |
12
|
Peter Corbett , Bob English , Atul Goel , Tomislav Grcanac , Steven Kleiman , James Leong , Sunitha Sankar, Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction, Proceedings of the 3rd USENIX Conference on File and Storage Technologies, March 31-31, 2004, San Francisco, CA
|
 |
13
|
Frank Dabek , M. Frans Kaashoek , David Karger , Robert Morris , Ion Stoica, Wide-area cooperative storage with CFS, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
| |
14
|
T. Dawber, G. Meadors, and F. Moore. Epidemiological Approaches to Heart Disease: the Framingham Study. American Journal of Public Health, 41(3):279--81, Mar. 1951.
|
| |
15
|
W. Diffie. Perspective: Decrypting The Secret to Strong Security. http://news.com.com/2010--1071--980462.html, Jan. 2003.
|
| |
16
|
G. Goble. http://ghg.ecn.purdue.edu/~ghg/.
|
| |
17
|
Google, Inc. About Gmail. http://gmail.google.com/gmail/help/about.html, June 2005.
|
| |
18
|
J. Gray, A. Szalay, A. Thakar, C. Stoughton, and J. vandenBerg. Online Scientific Data Curation, Publication, and Archiving. Technical Report MSR-TR-2002--74, Microsoft Research, July 2002.
|
| |
19
|
J. Gray and C. van Ingen. Emprical Measurements of Disk Failure Rates and Error Rates. Technical Report MSR-TR-2005-166, Microsoft Research, Dec. 2005.
|
| |
20
|
E. Hansen. Hotmail Incinerates Customer Files. News.com, http://news.com.com/Hotmail+incinerates+customer+files/2100--1038_3--5226090.html, June 2004.
|
| |
21
|
J. Horlings. CD-R's Binnen Twee Jaar Onleesbaar, 2003. PC Active, See http://www.cdfreaks.com/news/7751.
|
| |
22
|
IT Committee Inst. of Chartered Accountants of India. Tape backup vis-à-vis online Backup. Harmony IT, http://isaicai.org/Harmony/2004--07/index_plain.htm, July 2004.
|
| |
23
|
F. Junqueira, R. Bhagwan, A. Hevia, K. Marzullo, and G. M. Voelker. Surviving Internet Catastrophes. In Usenix Annual Technical Conference, 2005.
|
| |
24
|
H. Kari. Latent Sector Faults and Reliability of Disk Arrays. PhD thesis, Computer Science Department, Helsinki University of Technology, Finaland, Espoo, Finland, 1997.
|
| |
25
|
M. Keeney, E. Kowalski, D. Cappelli, A. Moore, T. Shimeall, and S. Rogers. Insider Threat Study: Computer System Sabotage in Critical Infrastructure Sectors. http://www.secretservice.gov/ntac/its_report_050516.pdf, May 2005.
|
| |
26
|
|
 |
27
|
John Kubiatowicz , David Bindel , Yan Chen , Steven Czerwinski , Patrick Eaton , Dennis Geels , Ramakrishna Gummadi , Sean Rhea , Hakim Weatherspoon , Chris Wells , Ben Zhao, OceanStore: an architecture for global-scale persistent storage, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.190-201, November 2000, Cambridge, Massachusetts, United States
|
| |
28
|
R. Lau. Personal Communication, Sept. 2004.
|
| |
29
|
D. Lazarus. Prccious Photos Disappear. San Francisco Chronicle, http://www.sfgate.com/cgi-bin/article.cgi?file=/chronicle/archive/2005/02/02/BUG7QB3UOS1.DTL, Feb. 2005.
|
| |
30
|
P. Luse and M. Schmisseur. Understanding Intelligent RAID 6. Technology@Intel Magazine, http://www.intel.com/technology/magazine/computing/RAID-6-0505.htm, 2006.
|
| |
31
|
R. Malda. The Myth of the 100 Year CD-Rom. Slashdot, http://slashdot.org/article.pl?sid=04/04/22/1658251\&mode=flat\&tid=137\&ti, Apr. 2004.
|
 |
32
|
|
| |
33
|
D. Milbank. White House Web Scrubbing, Dec. 2003. The Washington Post, http://www.washingtonpost.com/ac2/wp-dyn?pagename=article&node=&contentId=A9821--2003Dec17¬Found=true.
|
| |
34
|
|
| |
35
|
NASA. Aviation Safety Reporting System. http://asrs.arc.nasa.gov/.
|
| |
36
|
OCLC. Persistent Uniform Resource Locator. http://purl.oclc.org/.
|
| |
37
|
K. Pang, K. Yau, and Hung-Hsiang Chou. The Earth's Palaeorotation, Postglacial Rebound and Lower Mantle Viscosity from Analysis of Ancient Chinese Eclipse Records. Pure and Applied Geophysics, 145(3--4):459--485, Sept. 1995.
|
 |
38
|
David A. Patterson , Garth Gibson , Randy H. Katz, A case for redundant arrays of inexpensive disks (RAID), Proceedings of the 1988 ACM SIGMOD international conference on Management of data, p.109-116, June 01-03, 1988, Chicago, Illinois, United States
|
 |
39
|
Vijayan Prabhakaran , Lakshmi N. Bairavasundaram , Nitin Agrawal , Haryadi S. Gunawi , Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau, IRON file systems, Proceedings of the twentieth ACM symposium on Operating systems principles, October 23-26, 2005, Brighton, United Kingdom
|
| |
40
|
J. Reason. Human Error. Cambridge University Press, 1990.
|
| |
41
|
Reuters. Time Warner Says Employee Data Lost by Outside Storage Company. The New York Times, http://www.nytimes.com/2005/05/02/business/business-tech-timewarner.html?ex=1272686400&en=39cc177d5da055d2&ei=5090&partner=rssuserland&emc=rss, May 2005.
|
| |
42
|
D. S. H. Rosenthal. A Digital Preservation Network Appliance Based on OpenBSD. In Proceedings of BSDcon 2003, San Mateo, CA, USA, Sept. 2003.
|
 |
43
|
Antony Rowstron , Peter Druschel, Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility, Proceedings of the eighteenth ACM symposium on Operating systems principles, October 21-24, 2001, Banff, Alberta, Canada
|
 |
44
|
|
| |
45
|
Thomas J. E. Schwarz , Qin Xin , Ethan L. Miller , Darrell D. E. Long , Andy Hospodor , Spencer Ng, Disk Scrubbing in Large Archival Storage Systems, Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04), p.409-418, October 04-08, 2004
|
| |
46
|
Seagate. ST3200822A Configuration and Specifications. http://www.seagate.com/support/disc/specs/ata/st3200822a.html, Sept. 2003.
|
| |
47
|
Seagate. Cheetah 15K.4. http://www.seagate.com/cda/products/discsales/enterprise/tech/0,1084,656,00.html, 2005.
|
| |
48
|
R. F. Sproull and J. Eisenberg. Building an Electronic Records Archive at the National Archives and Records Administration: Recommendations for a Long-Term Strategy, http://www.nap.edu/catalog/11332.html, June 2005.
|
| |
49
|
|
| |
50
|
The Memory Hole. Department of Education to Delete Years of Research From Its Website. http://www.thememoryhole.org/edu/ed-info.htm, 2002.
|
| |
51
|
The OpenRAW Working Group. The RAW Problem. http://openraw.org, 2005.
|
| |
52
|
J. Tom. When Mutilators Stalk the Stacks. http://gort.ucsd.edu/preseduc/bmlmutil.htm, 2000.
|
| |
53
|
S. Towers. Personal Communication, July 2004.
|
 |
54
|
|
| |
55
|
|
| |
56
|
|
| |
57
|
D. Whitehouse. Reworked Images Reveal Hot Venus. BBC News, Jan. 2004.
|
| |
58
|
|
CITED BY 19
|
|
Mark W. Storer , Kevin M. Greenan , Ethan L. Miller , Kaladhar Voruganti, Pergamum: replacing tape with energy efficient, reliable, disk-based archival storage, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
Akshat Verma , Kaladhar Voruganti , Ramani Routray , Rohit Jain, SWEEPER: an efficient disaster recovery point identification mechanism, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
|
|
|
Mark W. Storer , Kevin M. Greenan , Ethan L. Miller , Kaladhar Voruganti, POTSHARDS: secure long-term storage without encryption, 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, p.1-14, June 17-22, 2007, Santa Clara, CA
|
|
|
Mehul A. Shah , Mary Baker , Jeffrey C. Mogul , Ram Swaminathan, Auditing to keep online storage services honest, Proceedings of the 11th USENIX workshop on Hot topics in operating systems, p.1-6, May 07-09, 2007, San Diego, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Factor , Ealan Henis , Dalit Naor , Simona Rabinovici-Cohen , Petra Reshef , Shahar Ronen , Giovanni Michetti , Maria Guercio, Authenticity and provenance in long term digital preservation: modeling and implementation in preservation aware storage, First workshop on on Theory and practice of provenance, p.1-10, February 23, 2009, San Francisco, CA
|
|
|
Kevin M. Greenan , Ethan L. Miller , Thomas J. E. Schwarz , Darrell D.E. Long, Disaster recovery codes: increasing reliability with large-stripe erasure correcting codes, Proceedings of the 2007 ACM workshop on Storage security and survivability, October 29-29, 2007, Alexandria, Virginia, USA
|
|
|
S. Rabinovici-Cohen , M. E. Factor , D. Naor , L. Ramati , P. Reshef , S. Ronen , J. Satran , D. L. Giaretta, Preservation DataStores: new storage paradigm for preservation environments, IBM Journal of Research and Development, v.52 n.4, p.389-399, July 2008
|
|
|
|
|
|
Byung-Gon Chun , Petros Maniatis , Scott Shenker , John Kubiatowicz, Tiered fault tolerance for long-term integrity, Proccedings of the 7th conference on File and stroage technologies, p.267-282, February 24-27, 2009, San Francisco, California
|
|
|
Hakim Weatherspoon , Lakshmi Ganesh , Tudor Marian , Mahesh Balakrishnan , Ken Birman, Smoke and mirrors: reflecting files at a geographically remote location without loss of performance, Proccedings of the 7th conference on File and stroage technologies, p.211-224, February 24-27, 2009, San Francisco, California
|
|
|
|
|
|
Gonçalo Antunes , José Barateiro , Manuel Cabral , José Borbinha , Rodrigo Rodrigues, Preserving digital data in heterogeneous environments, Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, June 15-19, 2009, Austin, TX, USA
|
|
|
Prince Mahajan , Ramakrishna Kotla , Catherine C. Marshall , Venugopalan Ramasubramanian , Thomas L. Rodeheffer , Douglas B. Terry , Ted Wobber, Effective and efficient compromise recovery for weakly consistent replication, Proceedings of the fourth ACM european conference on Computer systems, April 01-03, 2009, Nuremberg, Germany
|
|
|
Mark W. Storer , Kevin M. Greenan , Ethan L. Miller , Kaladhar Voruganti, POTSHARDS—a secure, recoverable, long-term archival storage system, ACM Transactions on Storage (TOS), v.5 n.2, p.1-35, June 2009
|
|