ACM Home Page
Please provide us with feedback. Feedback
Archiving scientific data
Full text PdfPdf (1.27 MB)
Source International Conference on Management of Data archive
Proceedings of the 2002 ACM SIGMOD international conference on Management of data table of contents
Madison, Wisconsin
SESSION: Research session: data warehousing and archive table of contents
Pages: 1 - 12  
Year of Publication: 2002
ISBN:1-58113-497-5
Authors
Peter Buneman  University of Edinburgh and University of Pennsylvania
Sanjeev Khanna  University of Pennsylvania
Keishi Tajima  Japan Advanced Institute of Science and Technology
Wang-Chiew Tan  University of Pennsylvania
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 58,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564691.564693
What is a DOI?

ABSTRACT

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Research, 28:45-48, 2000.
 
2
P. Buneman, S. Khanna, K. Tajima, and W. Tan. Archiving Scientific Data. Technical report, University of Pennsylvania, 2002.
 
3
The WWW Virtual Library of Cell Biology. http://vlib.org/Science/Cell_Biology/databases.shtml.
 
4
Concurrent Versions System. Unix man pages - cvs.
 
5
E. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.
 
6
G. Cobena and S. Abiteboul and A. Marian. Detecting Changes in XML Documents. In Int'l Conf. on Data Engineering, 2001.
 
7
XML TreeDiff. http://www.alphaworks.ibm.com/formula/xmltreediff.
 
8
J. Clark and S. DeRose. XML Path Language (XPath). W3C Working Draft, November 1999. http://www.w3.org/TR/xpath.
 
9
 
10
 
11
12
 
13
 
14
Online Mendelian Inheritance in Man, OMIM (TM), 2000. http://www.ncbi.nlm.nih.gov/omim/.
15
 
16
The NIST Reference on Constants, Units, and Uncertainty. http://physics.nist.gov/cuu/Constants/links.html.
 
17
 
18
19
20
 
21
Source Code Control System. Unix man pages - sccs.
 
22
 
23
K. Tufte and D. Maier. Aggregation and Accumulation of XML Data. IEEE Data Engineering Bulletin, 24(2):34-39, 2001.
 
24
W. Miller and E. Myers. A file comparison program. Software-Practice and Experience, 15(11):1025-1040, 1985.
 
25
W3C. Extensible Markup Language (XML) 1.0, Feb 1998. http://www.w3.org/TR/REC-xml.
 
26
W3C. Namespaces in XML, January 1999. http://www.w3.org/TR/REC-xml-names.
 
27
W3C. XML Schema Part 0: Primer, May 2000. http://www.w3.org/TR/xmlschema-0/.
 
28
W3C. XQuery 1.0: An XML Query Language, June 2001. http://www.w3.org/TR/xquery/.

CITED BY  13

Collaborative Colleagues:
Peter Buneman: colleagues
Sanjeev Khanna: colleagues
Keishi Tajima: colleagues
Wang-Chiew Tan: colleagues