|
ABSTRACT
The ability to flexibly compose confidence computation with the operations of relational algebra is an important feature of probabilistic database query languages. Computing confidences is computationally hard, however, and has to be approximated in practice. In a compositional query language, even very small errors caused by approximation can lead to an entirely incorrect result: A selection operation on an approximated probability can incorrectly keep or drop a tuple even if the probability value has been approximated to a very narrow confidence interval. In this paper, we study the query evaluation problem for compositional query languages for probabilistic databases with particular focus on providing overall result quality guarantees in the face of approximate intermediate results. We present a framework for evaluating compositional queries based on a new representation system that can capture uncertainty about probabilities. More specifically, we consider probability intervals instead of exact probabilities, interpreting tuples obtained by selection on approximate values as unreliable. We study the complexity of query evaluation over our new model. We present efficient confidence computation algorithms which compute bounds that are close to tight for important classes. For deciding a selection predicate, we show that no efficient randomized algorithm exists unless BPP⊃NP. Still we are able to efficiently guess robust predicates with a good error bound. Putting all these pieces together in our framework, we evaluate queries using a decomposition into a relational algebra plan and an approximation plan. The latter allows to successively improve accuracy and error bounds, while the relational algebra plan only has to be executed once.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Lyublena Antova, Thomas Jansen, Christoph Koch, and Dan Olteanu. "Fast and Simple Relational Processing of Uncertain Data". In Proc. ICDE, 2008.
|
| |
4
|
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. "Conditional Functional Dependencies for Data Cleaning". In Proc. ICDE, 2007.
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
Erich Grädel , Yuri Gurevich , Colin Hirsch, The complexity of query reliability, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.227-234, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.295124]
|
| |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
Luca Trevisan. A note on approximate counting for k-dnf. In Proc. APPROX-RANDOM, 2004.
|
| |
19
|
Leslie G. Valiant. "The Complexity of Enumeration and Reliability Problems". SIAM J. Comput., 8(3), 1979.
|
| |
20
|
Jennifer Widom. "TRIO: A System for Managing Data, Uncertainty, and Lineage". In Charu Agarwal, editor, Managing and Mining Uncertain Data, 2008. To appear.
|
| |
21
|
Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. Query algebra operations for interval probabilities. In Proc. DEXA, 2003.
|
| |
22
|
|
|