ACM Home Page
Please provide us with feedback. Feedback
A survey of data provenance in e-science
Full text PdfPdf (625 KB)
Source ACM SIGMOD Record archive
Volume 34 ,  Issue 3  (September 2005) table of contents
COLUMN: Special section on scientific workflows table of contents
Pages: 31 - 36  
Year of Publication: 2005
ISSN:0163-5808
Authors
Yogesh L. Simmhan  Indiana University, Bloomington, IN
Beth Plale  Indiana University, Bloomington, IN
Dennis Gannon  Indiana University, Bloomington, IN
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 158,   Citation Count: 33
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1084805.1084812
What is a DOI?

ABSTRACT

Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources.In this paper we create a taxonomy of data provenance characteristics and apply it to current research efforts in e-science, focusing primarily on scientific workflow approaches. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. The survey culminates with an identification of open research problems in the field.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Brase, "Using Digital Library Techniques - Registration of Scientific Primary Data," in ECDL, 2004.
 
2
D. G. Clarke and D. M. Clark, "Lincage," in Elements of Spatial Data Quality, 1995.
 
3
J. L. Romeu, "Data Quality and Pedigree," in Material Ease, 1999.
4
 
5
"Access to genetic resources and Benefit-Sharing (ABS) Program," United Nations University, 2003.
 
6
 
7
D. P. Lanter, "Design of a Lineage-Based Meta-Data Base for GIS," in Cartography and Geographic Information Systems, vol. 18, 1991.
 
8
M. Greenwood, C. Goble, R. Stevens, J. Zhao, M. Addis, D. Marvin, L. Moreau, and T. Oinn, "Provenance of e-Science Experiments - experience from Bioinformatics," in Proceedings of the UK OST e-Science 2nd AHM, 2003.
 
9
Y. L. Simmhan, B. Plale, and D. Gannon, "A Survey of Data Provenance Techniques," in Technical Report TR-618: Computer Science Department, Indiana University, 2005.
10
 
11
S. Miles, P. Groth, M. Branco, and L. Moreau, "The requirements of recording and using provenance in e-Science experiments," in Technical Report, Electronics and Computer Science, University of Southampton, 2005.
 
12
D. Pearson, "Presentation on Grid Data Requirements Scoping Metadata & Provenance," in Workshop on Data Derivation and Provenance, Chicago, 2002.
 
13
G. Cameron, "Provenance and Pragmatics," in Workshop on Data Provenance and Annotation, Edinburgh, 2003.
 
14
C. Goble, "Position Statement: Musings on Provenance, Workflow and (Semantic Web) Annotations for Bioinformatics," in Workshop on Data Derivation and Provenance, Chicago, 2002.
 
15
P. P. da Silva, D. L. McGuinness, and R. McCool, "Knowledge Provenance Infrastructure," in IEEE Data Engineering Bulletin, vol. 26, 2003.
 
16
H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita, "Improving Data Cleaning Quality Using a Data Lineage Facility," in DMDW, 2001.
 
17
I. T. Foster, J. S. Vöckler, M. Wilde, and Y. Zhao. "The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration," in CIDR, 2003.
 
18
J. Zhao, C. A. Goble, R. Stevens, and S. Bechhofer, "Semantically Linking and Browsing Provenance Logs for E-science," in ICSNW, 2004.
 
19
 
20
B. Plale, D. Gannon, D. Reed, S. Graves, K. Droegemeier, B. Wilhelmson, and M. Ramamurthy, "Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD," in ICCS workshop on Dynamic Data Driven Applications, 2005.
 
21
D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya, "An Annotation Management System for Relational Databases," in VLDB, 2004.
 
22
 
23
J. Widom, "Trio: A System for Integrated Management of Data, Accuracy, and Lineage," in CIDR, 2005.
 
24
C. Pancerella, J. Hewson, W. Koegler, D. Leahy, M. Lee, L. Rahn, C. Yang, J. D. Myers, B. Didier, R. McCoy, K. Schuchardt, E. Stephan, T. Windus, K. Amin, S. Bittner, C. Lansing, M. Minkoff, S. Nijsure, G. v. Laszewski, R. Pinzon, B. Ruscic, Al Wagner, B. Wang, W. Pitz, Y. L. Ho, D. Montoya, L. Xu, T. C. Allison, W. H. Green, Jr, and M. Frenklach, "Metadata in the collaboratory for multi-scale chemical science," in Dublin Core Conference, 2003.
 
25
J. Myers, C. Pancerella, C. Lansing, K. Schuchardt, and B. Didier, "Multi-Scale Science, Supporting Emerging Practice with Semantically Derived Provenance," in ISWC workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data, 2003.
 
26
R. Bose and J. Frew, "Composing Lineage Metadata with XML for Custom Satellite-Derived Data Products," in SSDBM, 2004.
 
27
 
28
James Frew , Rajendra Bose, Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products, Proceedings of the 13th International Conference on Scientific and Statistical Database Management, p.180-189, July 18-20, 2001
 
29

CITED BY  33

Collaborative Colleagues:
Yogesh L. Simmhan: colleagues
Beth Plale: colleagues
Dennis Gannon: colleagues