ACM Home Page
Please provide us with feedback. Feedback
Provenance for nested subqueries
Full text PdfPdf (896 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: Provenance table of contents
Pages 982-993  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Boris Glavic  University of Zurich
Gustavo Alonso  ETH Zurich
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 64,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516472
What is a DOI?

ABSTRACT

Data provenance is essential in applications such as scientific computing, curated databases, and data warehouses. Several systems have been developed that provide provenance functionality for the relational data model. These systems support only a subset of SQL, a severe limitation in practice since most of the application domains that benefit from provenance information use complex queries. Such queries typically involve nested subqueries, aggregation and/or user defined functions. Without support for these constructs, a provenance management system is of limited use.

In this paper we address this limitation by exploring the problem of provenance derivation when complex queries are involved. More precisely, we demonstrate that the widely used definition of Why-provenance fails in the presence of nested subqueries, and show how the definition can be modified to produce meaningful results for nested subqueries. We further present query rewrite rules to transform an SQL query into a query propagating provenance. The solution introduced in this paper allows us to track provenance information for a far wider subset of SQL than any of the existing approaches. We have incorporated these ideas into the Perm provenance management system engine and used it to evaluate the feasibility and performance of our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. O. Akinde et al. Efficient computation of subqueries in complex OLAP. ICDE '03, pages 163--174, 2003.
 
2
3
4
5
6
7
8
9
 
10
F. Geerts et al. MONDRIAN: Annotating and querying databases through colors and blocks. Technical Report EDIINFRR0243, The University of Edinburgh, 2005.
 
11
B. Glavic et al. Data provenance: A categorization of existing approaches. In BTW '07, pages 227--241, 2007.
 
12
B. Glavic et al. Perm: Processing provenance and data on the same data model through query rewriting. In ICDE '09, 2009.
13
14
 
15
 
16
 
17
M. Mutsuzaki et al. Trio-One: Layering uncertainty and lineage on a conventional DBMS. CIDR '07, pages 269--274, 2007.
18
 
19
W. Tan et al. Provenance in Databases: Past, Current, and Future. IEEE Data Eng. Bull., 30(4):3--12, 2007.
 
20
Transaction Processing Performance Council. TPC-H Benchmark Specification. http://www.tpc.org/hspec.html, 2008.

Collaborative Colleagues:
Boris Glavic: colleagues
Gustavo Alonso: colleagues