|
ABSTRACT
We address the problem of finding a "best" deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the well-studied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Top-k ranking queries, group-by aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NP-hardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like x-tuples and block-independent disjoint models, and is of independent interest.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
Amol Deshpande , Carlos Guestrin , Samuel R. Madden , Joseph M. Hellerstein , Wei Hong, Model-driven data acquisition in sensor networks, Proceedings of the Thirtieth international conference on Very large data bases, p.588-599, August 31-September 03, 2004, Toronto, Canada
|
 |
13
|
Cynthia Dwork , Ravi Kumar , Moni Naor , D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th international conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372165]
|
| |
14
|
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation revistied. In Manuscript, 2001.
|
| |
15
|
|
 |
16
|
|
| |
17
|
Minos Garofalakis and Dan Suciu, editors. IEEE Data Engineering Bulletin Special Issue on Probabilistic Data Management. March 2006.
|
 |
18
|
|
| |
19
|
Todd Green and Val Tannen. Models for incomplete and probabilistic information. In EDBT, 2006.
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
J.C. Borda. Mémoire sur les élections au scrutin. Histoire de l'Acad'emie Royale des Sciences, 1781.
|
| |
26
|
J.G. Kemeny. Mathematics without numbers. Daedalus, 88:571--591, 1959.
|
| |
27
|
J. Hodge and R.E. Klima. The mathematics of voting and elections: a hands-on approach. AMS, 2000.
|
 |
28
|
|
| |
29
|
Jian Li, Barna Saha, and Amol Deshpande. Ranking and clustering in probabilistic databases. http://www.cs.umd.edu/~lijian/paper/clusterrank_tr.pdf, 2008. Unpublished manuscript.
|
| |
30
|
|
| |
31
|
M.J. Condorcet. Éssai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix. 1785.
|
| |
32
|
Christopher Re, Nilesh Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.
|
| |
33
|
|
| |
34
|
|
| |
35
|
Prithviraj Sen and Amol Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
|
| |
36
|
|
| |
37
|
M. Soliman, I. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.
|
| |
38
|
Christopher Réand Dan Suciu. Efficient evaluation of having queries on a probabilistic database. In DBPL, 2007.
|
| |
39
|
Daisy Zhe Wang, Eirinaios Michelakis, Minos Garofalakis, and Joseph M. Hellerstein. BayesStore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, Auckland, New Zealand, 2008.
|
| |
40
|
|
| |
41
|
Y. Wakabayashi. The complexity of computing medians of relations. In Resenhas, volume 3(3), pages 323--349, 1998.
|
| |
42
|
Xi Zhang and Jan Chomicki. On the semantics and evaluation of top-k queries in probabilistic databases. In DBRank, 2008.
|
|