| Optimizing user views for workflows |
| Full text |
Pdf
(880 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 361
archive
Proceedings of the 12th International Conference on Database Theory
table of contents
St. Petersburg, Russia
SESSION: Provenance
table of contents
Pages 310-323
Year of Publication: 2009
ISBN:978-1-60558-423-2
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 74, Citation Count: 1
|
|
|
ABSTRACT
A technique called user views has recently been proposed to focus user attention on relevant information in response to provenance queries over workflow executions [1, 2]: Given user input on what modules in the workflow specification are relevant to the user, a user view is a concise representation that clusters together modules to create a small number of composite modules (or clusters) such that (1) each composite module in a user view contains at most one relevant (atomic) module, thus assuming the "meaning" of that module; and (2) no control or data dependencies (either direct or indirect) are introduced (soundness) or removed (completeness) between relevant modules. The goal is to find a user view with a smallest number of composite modules. We show that for workflow specifications that are general graphs, regardless of the number of distinct modules in the input workflow and the structure of interaction between them, there always exists a user view of size at most (2k--1 -- k)2 + k, where k is the number of relevant modules. Moreover, a good user view with at most (2k--1 -- k)2 + k clusters can be computed in polynomial time in the size of the graph. We also show that this upper bound is tight. Thus in general graphs, the number of composite modules can be exponentially large in k even in an optimum user view for the specification. We also give a characterization of a good user view in terms of structural properties of each cluster in the user view. However, for series-parallel workflow graphs, we show that there is always a user-view with at most 2k -- 3 composite modules; further, there exist series-parallel graphs where every user view requires at least 2k -- 3 composite modules. Such graphs capture the structure of many scientific and other workflows that we have encountered in practice. For this class of graphs, we give a simple, linear time algorithm for constructing an optimum user view for a given specification.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
O. Biton, S. Cohen-Boulakia, S. Davidson, and C. Hara. Querying and managing provenance through user views in scientific workflows. In ICDE, 2008.
|
 |
3
|
|
 |
4
|
|
| |
5
|
S. Bowers and B. Ludäscher. Actor-oriented design of scientific workflows. In ER, 2005.
|
| |
6
|
|
| |
7
|
BPEL. business process execution language for web services. http://www-128.ibm.com/developerworks/library/specification/ws-bpel/.
|
| |
8
|
|
| |
9
|
S. B. Davidson, S. Cohen-Boulakia, A. Eyal, B. Ludĺascher, T. McPhillips, S. Bowers, M. K. Anand, and J. Freire. Provenance in scientific workflow systems. IEEE Data Engineering Bulletin, 30(4):44--50, December 2007.
|
 |
10
|
|
| |
11
|
Ian T. Foster , Jens-S. Vöckler , Michael Wilde , Yong Zhao, Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of the 14th International Conference on Scientific and Statistical Database Management, p.37-46, July 24-26, 2002
[doi> 10.1109/SSDM.2002.1029704]
|
| |
12
|
J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In IPAW, 2006.
|
| |
13
|
L. Moreau and B. Ludäscher, editors. Concurrency and Computation: Practice and Experience -- Special Issue on the First Provenance Challenge. Wiley, 2007 (in press). (see also http://twiki.ipaw.info/bin/view/Challenge/).
|
| |
14
|
Tom Oinn , Matthew Addis , Justin Ferris , Darren Marvin , Martin Senger , Mark Greenwood , Tim Carver , Kevin Glover , Matthew R. Pocock , Anil Wipat , Peter Li, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, v.20 n.17, p.3045-3054, November 2004
[doi> 10.1093/bioinformatics/bth361]
|
 |
15
|
|
 |
16
|
Jacobo Valdes , Robert E. Tarjan , Eugene L. Lawler, The recognition of Series Parallel digraphs, Proceedings of the eleventh annual ACM symposium on Theory of computing, p.1-12, April 30-May 02, 1979, Atlanta, Georgia, United States
[doi> 10.1145/800135.804393]
|
CITED BY
|
|
Peng Sun , Ziyang Liu , Susan B. Davidson , Yi Chen, Detecting and resolving unsound workflow views for correct provenance analysis, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|