|
ABSTRACT
Provenance in the context of workflows, both for the data they derive and for their specification, is an essential component to allow for result reproducibility, sharing, and knowledge re-use in the scientific community. Several workshops have been held on the topic, and it has been the focus of many research projects and prototype systems. This tutorial provides an overview of research issues in provenance for scientific workflows, with a focus on recent literature and technology in this area. It is aimed at a general database research audience and at people who work with scientific data and workflows. We will (1) provide a general overview of scientific workflows, (2) describe research on provenance for scientific workflows and show in detail how provenance is supported in existing systems; (3) discuss emerging applications that are enabled by provenance; and (4) outline open problems and new directions for database-related research.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
I. Altintas, O. Barney, and E. Jaeger-Frank. Provenance collection support in the kepler scientific workflow system. In Proceedings of the International Provenance and Annotation Workshop (IPAW), pages 118--132, 2006.
|
| |
3
|
|
| |
4
|
|
| |
5
|
O. Biton, S. Cohen-Boulakia, S. Davidson, and C. Hara. Querying and managing provenance through user views in scientific workflows. In Proceedings of ICDE, 2008. To appear.
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
Business Process Execution Language for Web Services. http://www.ibm.com/developerworks/library/specification/ws-bpel/.
|
 |
10
|
|
| |
11
|
|
| |
12
|
S. Cohen, S. C. Boulakia, and S. B. Davidson. Towards a model of provenance and user views in scientific workflows. In DILS, pages 264--279, 2006.
|
| |
13
|
|
| |
14
|
S. B. Davidson, S. C. Boulakia, A. Eyal, B. Ludäscher, T. M. McPhillips, S. Bowers, M. K. Anand, and J. Freire. Provenance in scientific workflow systems. IEEE Data Eng. Bull., 30(4):44--50, 2007.
|
| |
15
|
E. Deelman and Y. Gil. NSF Workshop on Challenges of Scientific Workflows. Technical report, NSF, 2006. http://vtcpc.isi.edu/wiki/index.php/Main_Page.
|
| |
16
|
Ewa Deelman , Gurmeet Singh , Mei-Hui Su , James Blythe , Yolanda Gil , Carl Kesselman , Gaurang Mehta , Karan Vahi , G. Bruce Berriman , John Good , Anastasia Laity , Joseph C. Jacob , Daniel S. Katz, Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Scientific Programming, v.13 n.3, p.219-237, July 2005
|
| |
17
|
Ian T. Foster , Jens-S. Vöckler , Michael Wilde , Yong Zhao, Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of the 14th International Conference on Scientific and Statistical Database Management, p.37-46, July 24-26, 2002
[doi> 10.1109/SSDM.2002.1029704]
|
| |
18
|
|
 |
19
|
|
| |
20
|
J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In International Provenance and Annotation Workshop (IPAW), LNCS 4145, pages 10--18, 2006. Invited paper.
|
| |
21
|
D. Gannon et al. A Workshop on Scientific and Scholarly Workflow Cyberinfrastructure: Improving Interoperability, Sustainability and Platform Convergence in Scientific And Scholarly Workflow. Technical report, NSF and Mellon Foundation, 2007. https://spaces.internet2.edu/display/SciSchWorkflow.
|
| |
22
|
|
| |
23
|
L. Haas. Information for people. http://www.almaden.ibm.com/cs/people/laura/ Information For People keynote.pdf, 2007. Keynote talk at ICDE.
|
| |
24
|
H. V. Jagadish. Making database systems usable. http://www.eecs.umich.edu/db/usable/ usability-sigmod.ppt, 2007. Keynote talk at SIGMOD.
|
| |
25
|
The Kepler Project. http://kepler-project.org.
|
| |
26
|
|
| |
27
|
Microsoft Workflow Foundation. http://msdn2.microsoft.com/en-us/netframework/ aa663322.aspx.
|
| |
28
|
Simon Miles , Paul Groth , Steve Munroe , Sheng Jiang , Thibaut Assandri , Luc Moreau, Extracting causal graphs from an open provenance data model, Concurrency and Computation: Practice & Experience, v.20 n.5, p.577-586, April 2008
[doi> 10.1002/cpe.v20:5]
|
| |
29
|
Luc Moreau , Ian Foster, Provenance and Annotation of Data: International Provenance and Annotation Workshop, IPAW 2006, Chicago, Il, USA, May 3-5, 2006, Revised Selected Papers (Lecture Notes in Computer Science), Springer-Verlag New York, Inc., Secaucus, NJ, 2006
|
| |
30
|
L. Moreau, J. Freire, J. Futrelle, R. McGrath, J. Myers, and P. Paulson. The open provenance model, December 2007. http://eprints.ecs.soton.ac.uk/14979.
|
 |
31
|
|
| |
32
|
First provenance challenge. http://twiki.ipaw.info/bin/view/Challenge/ FirstProvenanceChallenge, 2006. S. Miles, and L. Moreau (organizers).
|
| |
33
|
Second provenance challenge. http://twiki.ipaw.info/bin/view/Challenge/ SecondProvenanceChallenge, 2007. J. Freire, S. Miles, and L. Moreau (organizers).
|
| |
34
|
|
| |
35
|
|
 |
36
|
|
| |
37
|
Y. L. Simmhan, B. Plale, and D. Gannon. Karma2: Provenance management for data driven workflows. International Journal of Web Services Research, Idea Group Publishing, 5:1, 2008. To Appear.
|
| |
38
|
Y. L. Simmhan, B. Plale, D. Gannon, and S. Marru. Performance evaluation of the karma provenance framework for scientific workflows. In L. Moreau and I. T. Foster, editors, International Provenance and Annotation Workshop (IPAW), Chicago, IL, volume 4145 of Lecture Notes in Computer Science, pages 222--236. Springer, 2006.
|
| |
39
|
The Swift System. www.ci.uchicago.edu/swift.
|
| |
40
|
W. C. Tan. Provenance in databases: Past, current, and future. IEEE Data Eng. Bull., 30(4):3--12, 2007.
|
| |
41
|
The Taverna Project. http://taverna.sourceforge.net.
|
| |
42
|
The Triana Project. http://www.trianacode.org.
|
| |
43
|
VDS - The GriPhyN Virtual Data System. http://www.ci.uchicago.edu/wiki/bin/view/VDS/VDSWeb/WebMain.
|
| |
44
|
|
| |
45
|
The VisTrails Project. http://www.vistrails.org.
|
| |
46
|
|
CITED BY 5
|
|
Michael Factor , Ealan Henis , Dalit Naor , Simona Rabinovici-Cohen , Petra Reshef , Shahar Ronen , Giovanni Michetti , Maria Guercio, Authenticity and provenance in long term digital preservation: modeling and implementation in preservation aware storage, First workshop on on Theory and practice of provenance, p.1-10, February 23, 2009, San Francisco, CA
|
|
|
|
|
|
|
|
|
Peng Sun , Ziyang Liu , Susan B. Davidson , Yi Chen, Detecting and resolving unsound workflow views for correct provenance analysis, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|