|
ABSTRACT
Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions. Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work. Researchers are effectively prevented from determining, preserving, or providing the lineage of the computational data products they use and create, however, because of the lack of a definitive model for lineage retrieval and a poor fit between current data management tools and scientific software. Based on a comprehensive survey of lineage research and previous prototypes, we present a metamodel to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Alonso, G., Agrawal, D., El Abbadi, A., and Mohan, C. 1997a. Functionality and limitations of current workflow management systems. Computer Science Department, University of California at Santa Barbara, Santa Barbara, CA. Available at: http://www.inf.ethz.ch/personal/alonso/PAPERS/IEEE-Expert.ps.Z.
|
| |
3
|
Alonso, G., and El Abbadi, A. 1993. GOOSE: Geographic object oriented support environment. In Proceedings of the ACM Workshop on Advances in Geographic Information Systems. Arlington, VA. 38--49.
|
| |
4
|
|
| |
5
|
Alonso, G., Hagen, C., Schek, H.-J., and Tresch, M. 1998. Towards a platform for distributed application development. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164. Springer, Berlin. 195--221.
|
 |
6
|
Mikio Aoyama , Sanjiva Weerawarana , Hiroshi Maruyama , Clemens Szyperski , Kevin Sullivan , Doug Lea, Web services engineering: promises and challenges, Proceedings of the 24th International Conference on Software Engineering, May 19-25, 2002, Orlando, Florida
[doi> 10.1145/581339.581425]
|
| |
7
|
AT&T. 2001. Graphviz graph visualization software. AT&T Labs---Research. Available at: http://www.research.att.com/sw/tools/graphviz/.
|
| |
8
|
|
| |
9
|
Barkstrom, B. R. 1998. Digital archive issues from the perspective of an Earth Science data producer. Position Paper: ISO Archiving Workshop Series: Digital Archive Directions (DADs) Workshop (June). College Park, MD. Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/dads/.
|
| |
10
|
Barkstrom, B. R. 2002. Data product configuration management and versioning in large-scale production of satellite scientific data production. Position paper: Workshop on Data Derivation and Provenance (Oct.). Chicago, IL.
|
| |
11
|
Barry, A., Baker, N., Le Goff, J.-M., McClatchey, R., and Vialle, J.-P. 1998. Meta-data based design of workflow systems. Workshop paper: Metadata and Dynamic Object-Model Pattern Mining Workshop (at OOPSLA '98) (Oct.). Vancouver, Canada. Available at: http://www-poleia.lip6.fr/~razavi/aom/papers/oopsla98/mcclatchey.pdf.
|
| |
12
|
Becker, R. A., and Chambers, J. M. 1988. Auditing of data analyses. SIAM J. Sci. Stat. Comput. 9, 4, 747--760.
|
| |
13
|
Chad Berkley , Matthew Jones , Jivka Bojilova , Daniel Higgins Metacat, a Schema-Independent XML Database System, Proceedings of the 13th International Conference on Scientific and Statistical Database Management, p.171-179, July 18-20, 2001
|
| |
14
|
Bernstein, A., Dellarocas, C., and Klein, M. 1999. Towards adaptive workflow systems. SIGMOD Record 28, 3, 7--8.
|
| |
15
|
|
| |
16
|
|
| |
17
|
Buneman, P., and Foster, I. 2002a. Workshop on Data Derivation and Provenance. (Oct). Chicago, IL. Available at: http://www-fp.mcs.anl.gov/~foster/provenance/.
|
| |
18
|
Buneman, P., and Foster, I. 2003. Workshop on Data Provenance and Annotation (Dec.). Edinburgh, Scotland. Available at: http://www.nesc.ac.uk/esi/events/304/.
|
| |
19
|
|
| |
20
|
|
| |
21
|
Buneman, P., Khanna, S., and Tan, W. C. 2002b. Computing provenance and annotations for views. Workshop Paper: Workshop on Data Derivation and Provenance (Oct.). Chicago IL. Available at: http://people.cs.uchicago.edu/~yongzh/position_papers.html.
|
| |
22
|
Buneman, P., Maier, D., and Widom, J. 2000b. Where was your data yesterday, and where will it go tomorrow? Data Annotation and Provenance for Scientific Applications. Position paper for NSF Workshop on Information and Data Management (IDM '00): Research Agenda into the Future (March), Chicago IL.
|
| |
23
|
Cederqvist, P. 1993. Version management with CVS, Signum Support AB (Dec.). Available at: https://www.cvshome.org/docs/manual/.
|
| |
24
|
Chakravarthy, S., Krishnaprasad, V., Tamizuddin, Z., and Lambay, F. 1993. A federated multi-media DBMS for medical research: Architecture and functionality. Technical Report UF-CIS-TR-93-006, Department of Computer and Information Sciences, University of Florida, Gainesville, FL.
|
| |
25
|
|
| |
26
|
|
| |
27
|
Chen, L., Shadbolt, N. R., Goble, C., Tao, F., Cox, S. J., Puleston, C., and Smart, P. 2003. Towards a knowledge-based approach to semantic service composition. Lecture Notes in Computer Science. 2870, 319--334.
|
| |
28
|
|
| |
29
|
Clarke, D. G., and Clark, D. M. 1995. Lineage. In Elements of Spatial Data Quality, S. C. Guptill and J. L. Morrison, Eds., Elsevier Science, Oxford. 13--30.
|
 |
30
|
|
| |
31
|
|
| |
32
|
Cui, Y., Widom, J., and Wiener, J. L. 1997. Tracing the lineage of view data in a warehousing environment. Technical Report, Stanford University Database Group (Nov.). Stanford, CA. Available at: http://www-db.stanford.edu/pub/papers/lineage-full.ps.
|
 |
33
|
|
| |
34
|
Judith Bayard Cushing , David Maier , Meenakshi Rao , Don Abel , David Feller , D. Michael DeVaney, Computational Proxies: Modeling Scientific Applications in Object Databases, Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management, p.196-206, September 28-30, 1994
|
| |
35
|
|
| |
36
|
Draskic, J., Le Goff, J.-M., Willers, I., Estrella, F., Kovacs, Z., McClatchey, R., and Zsenei, M. 1999. Using a meta-model as the basis for enterprise-wide data navigation. In Proceedings of the 3rd IEEE Metadata Conference (MD'99) (April). Bethesda, MO.
|
| |
37
|
Eagan, P. D., and Ventura, S. J. 1993. Enhancing value of environmental data: data lineage reporting. J. Environ. Eng. 119, 1, 5--16.
|
| |
38
|
Elmagarmid, A., and Du, W. 1997. Workflow management: State of the art versus state of the products. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series, Vol. 164, Springer, Berlin. 1--17.
|
| |
39
|
ESRI. 1982. ARC/INFO geographic information system (GIS), ESRI, Redlands, CA. Available at: www.esri.com.
|
| |
40
|
Federal Geographic Data Committee. 1998. Content standard for digital geospatial metadata FGDC-STD-001-1998 (revised June), Federal Geographic Data Committee, Washington, DC. Available at: http://www.fgdc.gov/metadata/csdgm/.
|
| |
41
|
Feldman, S. I. 1978. Make---A program for maintaining computer programs. In UNIX Programmer's Manual, Vol. 2 (Bell Laboratories). Holt, Rinehart and Winston, New York. 291--300.
|
| |
42
|
|
| |
43
|
Ian T. Foster , Jens-S. Vöckler , Michael Wilde , Yong Zhao, Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of the 14th International Conference on Scientific and Statistical Database Management, p.37-46, July 24-26, 2002
[doi> 10.1109/SSDM.2002.1029704]
|
| |
44
|
Foster, I., Vockler, J., Wilde, M., and Zhao, Y. 2003. The virtual data grid: A new model and architecture for data-intensive collaboration. In Proceedings of the 1st Biennial Conference on Innovative Data System Research (CIDR '03) {Online proceedings} (Jan.). Pacific Grove, CA.
|
| |
45
|
French, J. C. 1995. What is metadata? In Proceedings of the SDM--92 Workshop: The Role of Metadata in Managing Large Environmental Science Datasets, Richland, WA, R. B. Melton, D. M. DeVaney and J. C. French, Eds. Pacific Northwest Laboratory. 3--8.
|
| |
46
|
James Frew , Rajendra Bose, Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products, Proceedings of the 13th International Conference on Scientific and Statistical Database Management, p.180-189, July 18-20, 2001
|
 |
47
|
|
| |
48
|
Geist, A., and Nachtigal, N. 2003. ORNL Electronic Notebook Project. Oak Ridge National Laboratory. Available at: http://www.csm.ornl.gov/~geist/java/applets/enote/.
|
| |
49
|
Geographic Designs. 1993. Geolineus Version 3.0 User Manual. Santa Barbara, CA.
|
| |
50
|
|
| |
51
|
Goland, Y., Whitehead, E., Faizi, A., Carter, S., and Jensen, D. 1999. HTTP Extensions for distributed authoring--WEBDAV: RFC 2518. Network Working Group. Available at: http://asg.web.cmu.edu/rfc/rfc2518.html.
|
| |
52
|
Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., and Oinn, T. 2003. Provenance of e-science experiments---experience from bioinformatics. In Proceedings of the UK e-Science All Hands Meeting. Nottingham, UK. 223--226.
|
| |
53
|
Grid Physics Network (GriPhyN) project. 2003. Chimera Virtual Data System Version 1.2 User Guide, Grid Physics Network (GriPhyN) project (Dec.). Available at: http://www.griphyn.org/chimera/release.html.
|
| |
54
|
|
| |
55
|
Insightful Corporation. 2003. S-PLUS statistical analysis, graphics and programming application, Insightful Corporation, Seattle, WA. Available at: http://www.insightful.com/.
|
| |
56
|
|
| |
57
|
Ioannidis, Y., Livny, M., Haber, E., Miller, R., Tsatalos, O., and Wiener, J. 1993. Desktop experiment management. IEEE Data Eng. Bull. 16, 1, 19--23.
|
| |
58
|
IT Innovation. 2002. IT innovation workflow enactment engine. IT Innovation Centre. Available at: http://www.it-innovation.soton.ac.uk/mygrid/workflow/.
|
| |
59
|
|
| |
60
|
Kavantzas, N., Burdett, D., and Ritzinger, G. 2004. Web Services Choreography Description Language Version 1.0. W3C Working Draft, IBM developerWorks (April). Available at: http://www.w3.org/TR/ws-cdl-10/.
|
| |
61
|
Lanter, D. P. 1988. A neural network for GIS command language translation. Unpublished research paper. University of South Carolina, Columbia, SC.
|
| |
62
|
|
| |
63
|
Lanter, D. P. 1989b. Trimming Large spatial databases with lineage analysis. In Proceedings of the 10th Annual ESRI Users Conference. Palm Springs, CA.
|
| |
64
|
Lanter, D. P. 1990. Lineage in GIS: The problem and a solution. Technical Report 90-6, National Center for Geographic Information and Analysis (NCGIA), University of California at Santa Barbara, Santa Barbara, CA.
|
| |
65
|
Lanter, D. P. 1991. Design of a lineage-based meta-data base for GIS. Cart. Geograph. Info. Syst. 18, 4, 255--261.
|
| |
66
|
Lanter, D. P. 1993. A Lineage meta-database approach toward spatial analytic database optimization. Cart. Geograph. Info. Syst. 20, 2, 112--121.
|
| |
67
|
Lanter, D. P. 1994. Comparison of spatial analytic applications of GIS. In Environmental Information Management and Analysis: Ecosystem to Global Scales, W. K. Michener, J. W. Brunt and S. G. Stafford, Eds. Taylor & Francis, Bristol, PA. 413--425.
|
| |
68
|
Lanter, D. P., and Veregin, H. 1990. A lineage meta-database program for propagating error in geographic information systems. In Proceedings of the GIS/LIS Conference (Nov.). 144--153.
|
| |
69
|
Le Goff, J.-M., Vialle, J.-P., Bazan, A., Le Flour, T., Lieunard, S., Rousset, D., McClatchey, R., Baker, N., Kovacs, Z., Heath, H., Leonardi, E., Barone, G., and Organtini, G. 1996. C. R. I. S. T. A. L./ Concurrent repository & information system for tracking assembly and production lifecycles---A data capture and production management tool for the assembly and construction of the CMS ECAL detector. CERN CMS Note 1996/003, CERN, 1996, Geneva, Switzerland. Available at: http://cmsdoc.cern.ch/documents/96/note96_003.pdf.
|
| |
70
|
Lee, J., Gruninger, M., Jin, Y., Malone, T., Tate, A., and Yost, G. 1998. PIF The process interchange format. In Handbook on Architectures of Information Systems. P. Bernus, G. Schmidt and K. Mertins, Eds. Springer, Berlin. 167--189.
|
| |
71
|
Manola, F., and Miller, E. 2004. RDF Primer W3C Recommendation. World Wide Web Consortium (W3C). Available at: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/.
|
| |
72
|
Arunprasad P. Marathe, Tracing Lineage of Array Data, Proceedings of the 13th International Conference on Scientific and Statistical Database Management, p.69-78, July 18-20, 2001
|
| |
73
|
Mathworks. 2003. MATLAB programming and visualization application. The Mathworks, Inc., Natick, MA. Available at: http://www.mathworks.com/.
|
| |
74
|
R. McClatchey , N. Baker , W. Harris , J.-M. Le Goff , Z. Kovacs , F. Estrella , A. Bazan , T. Le Flour, Version management in a distributed workflow application, Proceedings of the 8th International Workshop on Database and Expert Systems Applications, p.10, September 01-02, 1997
|
| |
75
|
|
| |
76
|
R. McClatchey , Z. Kovacs , F. Estrella , J.-M. Le Goff , G. Chevenier , N. Baker , S. Lieunard , S. Murray , T. Le Flour , A. Bazan, The Integration of Product Data and Workflow Management Systems in a Large Scale Engineering Database Application, Proceedings of the 1998 International Symposium on Database Engineering & Applications, p.296, July 08-10, 1998
|
| |
77
|
|
| |
78
|
Merriam-Webster Inc. 2001. Merriam-Webster Collegiate Dictionary, Springfield, MA.
|
| |
79
|
Mohan, C. 1997. Recent Trends in workflow management products, standards and research. In Workflow Management Systems and Interoperability. A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer. 396--409.
|
| |
80
|
Myers, J., Pancerella, C., Lansing, C., Schuchardt, K., and Didier, B. 2003a. Multi-scale science: Supporting emerging practice with semantically derived provenance. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL. 2003.
|
| |
81
|
|
| |
82
|
National Aeronautics and Space Administration (NASA). 1986. Report of the EOS Data Panel, Vol. IIa: Earth Observing System Data and Information System. Technical Memorandum 87777, National Aeronautics and Space Administration (NASA), Washington, DC.
|
| |
83
|
National Research Council. 1999. Global Environmental Change: Research Pathways for the Next Decade. National Academy Press, Washington, DC.
|
| |
84
|
Object Management Group. 2002. Meta-Object Facility (MOF) Specification, Version 1.4. Object Management Group (OMG). Available at: http://www.omg.org/cgi-bin/doc?formal/2002-04-03.
|
| |
85
|
Object Management Group. 2004. dtc/04-05-01 (Life Sciences Identifiers final adopted specification). Object Management Group, Inc. Available at: http://www.omg.org/docs/dtc/04-05-01.pdf.
|
| |
86
|
|
| |
87
|
Pancerella, C., Myers, J., Allison, T. C., and Amin, K. 2003. Metadata in the collaboratory for multi-scale chemical science. In Proceedings of the Dublin Core Conference (DC-'03) {Online proceedings} (Sept.-Oct.). Seattle, WA.
|
 |
88
|
|
| |
89
|
Research Systems Inc. 2003. Interactive Data Language (IDL) computing environment for interactive analysis and visualization of data. Research Systems, Inc. Available at: http://www.rsinc.com/.
|
| |
90
|
Roush, G. E. 1989. Documenting one's work. IEEE Potentials 8, 2, 24--26.
|
| |
91
|
|
 |
92
|
Amitabh Saran , Divyakant Agrawal , Amr El Abbadi , Terence R. Smith , Jianwen Su, Scientific modeling using distributed resources, Proceedings of the 4th ACM international workshop on Advances in geographic information systems, p.68-75, November 1996, Rockville, Maryland, United States
[doi> 10.1145/258319.258339]
|
| |
93
|
|
| |
94
|
Singh, M., and Vouk, M. A. 1996. Scientific workflows: Scientific computing meets transactional workflow. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.
|
| |
95
|
Jenifer L. Skidmore , Matthew J. Sottile , Janice E. Cuny , Allen D. Malony, A prototype notebook-based environment for computational tools, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), p.1-15, November 07-13, 1998, San Jose, CA
|
| |
96
|
Smith, T. R., Su, J., Agrawal, D., and El Abbadi, A. 1993. Database and modeling systems for the earth sciences. IEEE Bull. Tech. Comm. Data Eng. 16, 1, 33--37.
|
| |
97
|
Terence R. Smith , Jianwen Su , Amr El Abbadi , Divyakant Agrawal , Gustavo Alonso , Amitabh Saran, Computational modeling systems, Information Systems, v.20 n.2, p.127-153, April 1995
[doi> 10.1016/0306-4379(95)98558-U]
|
| |
98
|
|
| |
99
|
Stein, L., Rozen, S., and Goodman, N. 1994. Managing laboratory flow with LabBase. In Proceedings of the Conference on Computers in Medicine (CompMed'94).
|
| |
100
|
|
| |
101
|
|
| |
102
|
|
| |
103
|
Thatte, S. 2003. Business Process Execution Language for Web Services Version 1.1. Specification, IBM developerWorks (May). Available at: http://www-106.ibm.com/developerworks/library/ws-bpel/.
|
| |
104
|
U.S. Geological Survey. 1992. Spatial Data Transfer Standard (SDTS) NCITS 320-1998, American National Standards Institute (ANSI) (June). Reston, VA. Available at: http://mcmcweb.er.usgs.gov/sdts/SDTS_standard_nov97/part1b12.html.
|
| |
105
|
U.S. Geological Survey. 1995. Modern Average Global Sea-Surface Temperature: Metadata. U.S. Geological Survey. Available at: http://geo-nsdi.er.usgs.gov/metadata/digital-data/10/metadata.html#2.
|
| |
106
|
UC Berkeley. 1994. POSTGRES database management system (DBMS), Universtity of California Berkeley, Berkeley, CA. Available at: http://db.cs.berkeley.edu/postgres.html.
|
| |
107
|
Vahdat, A., and Anderson, T. 1998. Transparent result caching. In Proceedings of the USENIX Annual Technical Conference {Online proceedings} (June). New Orleans, LA. 1998.
|
| |
108
|
Vossen, G., and Weske, M. 1997. The WASA Approach to workflow management for scientific applications. In Workflow Management Systems and Interoperability, A. Dogac, L. Kalinichenko, M. T. Ozsu and A. Sheth, Eds. NATO ASI Series Vol. 164, Springer, Berlin. 145--164.
|
 |
109
|
|
| |
110
|
Wainer, J., Weske, M., Vossen, G., and Medeiros, C. M. B. 1996. Scientific workflow systems. In Proceedings of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions {Online Proceedings} (May). Athens, GA.
|
| |
111
|
Winfield, A. J. 1998. A Virtual Laboratory Notebook for simulation models. In Proceedings of the Pacific Symposium on Biocomputing '98 (Jan.). Maui, HI. 177--88.
|
| |
112
|
|
| |
113
|
Workflow Management Coalition. 1999a. Interface 1: Process Definition Interchange---Process Model. WfMC Standard WfMC-TC-1016-P v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.
|
| |
114
|
Workflow Management Coalition. 1999b. Interface 1: Process Definition Interchange---Q&A and Examples. WfMC Standard WfMC-TC-1016-X v1.1, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.
|
| |
115
|
Workflow Management Coalition. 2001. Workflow Process Definition Interface---XML Process Definition Language (XPDL). WfMC Standard WFMC-TC-1025, Workflow Management Coalition. Available at: http://www.wfmc.org/standards/docs.htm.
|
| |
116
|
Zhao, J., Goble, C., Greenwood, M., Wroe, C., and Stevens, R. 2003. Annotating, linking and browsing provenance logs for e-Science. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data {Online proceedings} (Oct.). Sanibel Island, FL.
|
CITED BY 20
|
|
|
|
|
|
|
|
|
|
|
Simon Miles , Sylvia C. Wong , Weijian Fang , Paul Groth , Klaus-Peter Zauner , Luc Moreau, Provenance-based validation of e-science experiments, Web Semantics: Science, Services and Agents on the World Wide Web, v.5 n.1, p.28-38, March, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Curtis Dyreson , Richard T. Snodgrass , Faiz Currim , Sabah Currim , Shailesh Joshi, Weaving temporal and reliability aspects into a schema tapestry, Data & Knowledge Engineering, v.63 n.3, p.752-773, December, 2007
|
|
|
|
|
|
Shaowen Wang , Anand Padmanabhan , James D. Myers , Wenwu Tang , Yong Liu, Towards provenance-aware geographic information systems, Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems, November 05-07, 2008, Irvine, California
|
|
|
|
|
|
Liqiang Wang , Shiyong Lu , Xubo Fei , Artem Chebotko , H. Victoria Bryant , Jeffrey L. Ram, Atomicity and provenance support for pipelined scientific workflows, Future Generation Computer Systems, v.25 n.5, p.568-576, May, 2009
|
|
|
|
|
|
|
|
|
|
|