ACM Home Page
Please provide us with feedback. Feedback
A compositional framework for complex queries over uncertain data
Full text PdfPdf (604 KB)
Source ACM International Conference Proceeding Series; Vol. 361 archive
Proceedings of the 12th International Conference on Database Theory table of contents
St. Petersburg, Russia
SESSION: Uncertain databases table of contents
Pages 149-161  
Year of Publication: 2009
ISBN:978-1-60558-423-2
Authors
Michaela Götz  Cornell University, Ithaca, NY
Christoph Koch  Cornell University, Ithaca, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 53,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1514894.1514913
What is a DOI?

ABSTRACT

The ability to flexibly compose confidence computation with the operations of relational algebra is an important feature of probabilistic database query languages. Computing confidences is computationally hard, however, and has to be approximated in practice. In a compositional query language, even very small errors caused by approximation can lead to an entirely incorrect result: A selection operation on an approximated probability can incorrectly keep or drop a tuple even if the probability value has been approximated to a very narrow confidence interval.

In this paper, we study the query evaluation problem for compositional query languages for probabilistic databases with particular focus on providing overall result quality guarantees in the face of approximate intermediate results. We present a framework for evaluating compositional queries based on a new representation system that can capture uncertainty about probabilities. More specifically, we consider probability intervals instead of exact probabilities, interpreting tuples obtained by selection on approximate values as unreliable.

We study the complexity of query evaluation over our new model. We present efficient confidence computation algorithms which compute bounds that are close to tight for important classes. For deciding a selection predicate, we show that no efficient randomized algorithm exists unless BPP⊃NP. Still we are able to efficiently guess robust predicates with a good error bound. Putting all these pieces together in our framework, we evaluate queries using a decomposition into a relational algebra plan and an approximation plan. The latter allows to successively improve accuracy and error bounds, while the relational algebra plan only has to be executed once.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Lyublena Antova, Thomas Jansen, Christoph Koch, and Dan Olteanu. "Fast and Simple Relational Processing of Uncertain Data". In Proc. ICDE, 2008.
 
4
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. "Conditional Functional Dependencies for Data Cleaning". In Proc. ICDE, 2007.
5
 
6
7
8
 
9
10
11
 
12
 
13
14
 
15
16
17
 
18
Luca Trevisan. A note on approximate counting for k-dnf. In Proc. APPROX-RANDOM, 2004.
 
19
Leslie G. Valiant. "The Complexity of Enumeration and Reliability Problems". SIAM J. Comput., 8(3), 1979.
 
20
Jennifer Widom. "TRIO: A System for Managing Data, Uncertainty, and Lineage". In Charu Agarwal, editor, Managing and Mining Uncertain Data, 2008. To appear.
 
21
Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. Query algebra operations for interval probabilities. In Proc. DEXA, 2003.
 
22

Collaborative Colleagues:
Michaela Götz: colleagues
Christoph Koch: colleagues