|
ABSTRACT
This paper describes our experiences in exploring the applicability of software engineering approaches to scientific data management problems. Specifically, this paper describes how process definition languages can be used to expedite production of scientific datasets as well as to generate documentation of their provenance. Our approach uses a process definition language that incorporates powerful semantics to encode scientific processes in the form of a Process Definition Graph (PDG). The paper describes how execution of the PDG-defined process can generate Dataset Derivation Graphs (DDGs), metadata that document how the scientific process developed each of its product datasets. The paper uses an example to show that scientific processes may be complex and to illustrate why some of the more powerful semantic features of the process definition language are useful in supporting clarity and conciseness in representing such processes. This work is similar in goals to work generally referred to as Scientific Workflow. The paper demonstrates the contribution that software engineering can make to this domain.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ellison, A. M., Osterweil, L. J., Hadley, J. L., Wise, A., et al. 2006. Analytic Webs Support the Synthesis of Ecological Data Sets. Ecology, 87, 6. June 2006, 1345--1358.
|
| |
2
|
Osterweil, L. J., Wise, A., Clarke, L. A., Ellison, A. M., et al. 2005. Process Technology To Facilitate the Conduct of Science. In Proceedings of the Software Process Workshop, (Beijing, China, May 2005), Springer-Verlag, 403--415.
|
| |
3
|
Boose, E. R., Ellison, A. M., Osterweil, L. J., Podorozhny, R., et al. 2007. Ensuring Reliable Datasets for Environmental Models and Forecasts. Ecological Informatics 2, 237--247.
|
| |
4
|
Dingman, S. L. 2002. Physical Hydrology. 2nd Ed. Prentice Hall, NJ.
|
| |
5
|
|
 |
6
|
Aaron G. Cass , Barbara Staudt Lerner , Stanley M. Sutton, Jr. , Eric K. McCall , Alexander Wise , Leon J. Osterweil, Little-JIL/Juliette: a process definition language and interpreter, Proceedings of the 22nd international conference on Software engineering, p.754-757, June 04-11, 2000, Limerick, Ireland
[doi> 10.1145/337180.337623]
|
| |
7
|
Wise, A. 2006. Little-JIL 1.5 Language Report. Department of Computer Science, University of Massachusetts, UM-CS-2006-51.
|
| |
8
|
|
| |
9
|
Ian T. Foster , Jens-S. Vöckler , Michael Wilde , Yong Zhao, Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of the 14th International Conference on Scientific and Statistical Database Management, p.37-46, July 24-26, 2002
[doi> 10.1109/SSDM.2002.1029704]
|
| |
10
|
|
| |
11
|
K. Wolstencroft , T. Oinn , C. Goble , J. Ferris , C. Wroe , P. Lord , K. Glover , R. Stevens, Panoply of Utilities in Taverna, Proceedings of the First International Conference on e-Science and Grid Computing, p.156-162, December 05-08, 2005
[doi> 10.1109/E-SCIENCE.2005.65]
|
| |
12
|
Tom Oinn , Matthew Addis , Justin Ferris , Darren Marvin , Martin Senger , Mark Greenwood , Tim Carver , Kevin Glover , Matthew R. Pocock , Anil Wipat , Peter Li, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, v.20 n.17, p.3045-3054, November 2004
[doi> 10.1093/bioinformatics/bth361]
|
| |
13
|
|
| |
14
|
Pautasso, C. and Alonso, G. 2005. The Jopera Visual Composition Language. Journal of Visual Languages & Computing, 16, 1--2, 119--152.
|
| |
15
|
Eclipse.Org 2007. Eclipse-An Open Development Platform, 2007.
|
| |
16
|
Thomas Fahringer , Alexandru Jugravu , Sabri Pllana , Radu Prodan , Clovis Seragiotto, Jr. , Hong-Linh Truong, ASKALON: a tool set for cluster and Grid computing: Research Articles, Concurrency and Computation: Practice & Experience, v.17 n.2-4, p.143-169, February 2005
[doi> 10.1002/cpe.v17:2/4]
|
| |
17
|
T. Fahringer , R. Prodan , Rubing Duan , F. Nerieri , S. Podlipnig , Jun Qin , M. Siddiqui , Hong-Linh Truong , A. Villazon , M. Wieczorek, ASKALON: A Grid Application Development and Computing Environment, Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, p.122-131, November 13-14, 2005
[doi> 10.1109/GRID.2005.1542733]
|
| |
18
|
Bertram Ludäscher , Ilkay Altintas , Chad Berkley , Dan Higgins , Efrat Jaeger , Matthew Jones , Edward A. Lee , Jing Tao , Yang Zhao, Scientific workflow management and the Kepler system: Research Articles, Concurrency and Computation: Practice & Experience, v.18 n.10, p.1039-1065, August 2006
[doi> 10.1002/cpe.v18:10]
|
| |
19
|
Altintas, I., Barney, O. and Jaeger-Frank, E. 2006. Provenance Collection Support In the Kepler Scientific Workflow System. In Proceedings of the International Provenance and Annotation Workshop (Revised Selected Papers), (Chicago, IL, May 3--5, 2006), Springer Verlag 118--132.
|
| |
20
|
|
 |
21
|
Philip Baldwin , Sanjeev Kohli , Edward A. Lee , Xiaojun Liu , Yang Zhao, Modeling of sensor nets in Ptolemy II, Proceedings of the 3rd international symposium on Information processing in sensor networks, April 26-27, 2004, Berkeley, California, USA
[doi> 10.1145/984622.984675]
|
| |
22
|
Girault, A., Lee, B. and Lee, E. A. 1999. Hierarchical Finite State Machines with Multiple Concurrency Models. IEEE Transactions on CAD of Integrated Circuits and Systems, 18, 6, 742--760.
|
 |
23
|
|
 |
24
|
|
| |
25
|
Luc Moreau , Bertram Ludäscher , Ilkay Altintas , Roger S. Barga , Shawn Bowers , Steven Callahan , George Chin, Jr. , Ben Clifford , Shirley Cohen , Sarah Cohen-Boulakia , Susan Davidson , Ewa Deelman , Luciano Digiampietri , Ian Foster , Juliana Freire , James Frew , Joe Futrelle , Tara Gibson , Yolanda Gil , Carole Goble , Jennifer Golbeck , Paul Groth , David A. Holland , Sheng Jiang , Jihie Kim , David Koop , Ales Krenek , Timothy McPhillips , Gaurang Mehta , Simon Miles , Dominic Metzger , Steve Munroe , Jim Myers , Beth Plale , Norbert Podhorszki , Varun Ratnakar , Emanuele Santos , Carlos Scheidegger , Karen Schuchardt , Margo Seltzer , Yogesh L. Simmhan , Claudio Silva , Peter Slaughter , Eric Stephan , Robert Stevens , Daniele Turi , Huy Vo , Mike Wilde , Jun Zhao , Yong Zhao, Special Issue: The First Provenance Challenge, Concurrency and Computation: Practice & Experience, v.20 n.5, p.409-418, April 2008
[doi> 10.1002/cpe.v20:5]
|
| |
26
|
|
| |
27
|
Lanter, D. P. 1991. Design of A Lineage-Based Meta-Data Base for GIS. Cartography and Geographic Information Systems, 18, 4, 255--261.
|
| |
28
|
|
 |
29
|
|
| |
30
|
Feldman, S. I. 1979. Make---A Program for Maintaining Computer Programs. Software---Practice and Experience, 9, 3. March, 255--265.
|
| |
31
|
Rochkind, M. J. 1975. The Source Code Control System. IEEE Transactions on Software Engineering, SE-1. December 1975, 364--370.
|
 |
32
|
Steven P. Callahan , Juliana Freire , Emanuele Santos , Carlos E. Scheidegger , Cláudio T. Silva , Huy T. Vo, VisTrails: visualization meets data management, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142574]
|
 |
33
|
|
| |
34
|
Cobleigh, J. M., Clarke, L. A. and Osterweil, L. J. 2002. FLAVERS: A Finite State Verification Technique for Software Systems. IBM Systems Journal, 41, 1. 2002, 140--165.
|
| |
35
|
|
|