ACM Home Page
Please provide us with feedback. Feedback
One hundred years of data
Full text PdfPdf (175 KB)
Source dg.o; Vol. 151 archive
Proceedings of the 2006 international conference on Digital government research table of contents
San Diego, California
SESSION: Invited talks table of contents
Pages: 3 - 4  
Year of Publication: 2006
Author
Fran Berman  San Diego Supercomputer Center, UC San Diego
Sponsor
NSF : National Science Foundation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 33,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1146598.1146600
What is a DOI?

ABSTRACT

The 20th century brought about an "information revolution" which has forever altered the way we work, communicate, and live. In the 21st century, it is hard to imagine working without an increasingly broad array of enabling technologies and the data they provide. Much of this data will form the foundation for new discovery, advances, and policy over the next 100 years and beyond.The care and management of today's tidal wave of data has become an increasingly important focus for technology development. Collecting, providing, and preserving data responsibly presents both an opportunity and a challenge. Whereas books can be preserved for years and even centuries, the preservation of digital data is dependent on the technologies on which it is stored. In the next 100 years, storage technologies will advance tens of generations, and the digital collections preserved on up - to - date storage technologies will need to transition through each new generation, and many times over.Without a planned approach to preservation, valuable data will be damaged or lost. The stakes are high -- some data collections such as the Shoah Collection of Holocaust survivor testimony are irreplaceable, and some data collections such as the longitudinal Panel Study of Income Dynamics used by social scientists, and the Protein Data Bank used by biologists, are fundamental research tools. The challenges of responsible data preservation are great. Key questions that must be addressed in the preservation of long - lived digital data include:1) What should we save?We can't save everything, and even if we could, it would be exceedingly difficult to find useful information within the mass of data. Some data collections will need to be marked for preservation from the outset, and some collections will need to be "rescued".2) Who is responsible?Digital collections are of interest to many constituents - - data generators, users, stewards, etc. Who is responsible for preserving the digital data over the long - term? Who will pay for upkeep, technology transition, and the development of tools and interfaces to make the data accessible?3) How do we keep data safe?Digital media is more fragile than paper. Software bugs, power outages, hackers, and other problems threaten the reliability of digital collections. The risks can be mitigated when multiple copies of the data collection are generated and updated consistently.4) How should we save it?Communities vary widely in their usage patterns, formats, standards, and policies with respect to digital data. The cyberinfrastructure in which digital repositories are embedded must provide reliable, safe and usable access. Data collections must be available for analysis, modeling, public access, dissemination, and other types of usage.These questions and many other challenges must be addressed for responsible digital preservation. In this talk we focus on the development and deployment of Cyberinfrastructure for data management and preservation, and the challenges of developing a framework for data management and preservation over the foreseeable future.