ACM Home Page
Please provide us with feedback. Feedback
Consensus answers for queries over probabilistic databases
Full text PdfPdf (457 KB)
Source
Symposium on Principles of Database Systems archive
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Providence, Rhode Island, USA
SESSION: Uncertain data table of contents
Pages 259-268  
Year of Publication: 2009
ISBN:978-1-60558-553-6
Authors
Jian Li  University of Maryland, College Park, MD, USA
Amol Deshpande  University of Maryland, College Park, MD, USA
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559795.1559835
What is a DOI?

ABSTRACT

We address the problem of finding a "best" deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the well-studied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Top-k ranking queries, group-by aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NP-hardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like x-tuples and block-independent disjoint models, and is of independent interest.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
 
5
 
6
7
 
8
9
 
10
11
 
12
13
 
14
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation revistied. In Manuscript, 2001.
 
15
16
 
17
Minos Garofalakis and Dan Suciu, editors. IEEE Data Engineering Bulletin Special Issue on Probabilistic Data Management. March 2006.
18
 
19
Todd Green and Val Tannen. Models for incomplete and probabilistic information. In EDBT, 2006.
 
20
 
21
22
23
24
 
25
J.C. Borda. Mémoire sur les élections au scrutin. Histoire de l'Acad'emie Royale des Sciences, 1781.
 
26
J.G. Kemeny. Mathematics without numbers. Daedalus, 88:571--591, 1959.
 
27
J. Hodge and R.E. Klima. The mathematics of voting and elections: a hands-on approach. AMS, 2000.
28
 
29
Jian Li, Barna Saha, and Amol Deshpande. Ranking and clustering in probabilistic databases. http://www.cs.umd.edu/~lijian/paper/clusterrank_tr.pdf, 2008. Unpublished manuscript.
 
30
 
31
M.J. Condorcet. Éssai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix. 1785.
 
32
Christopher Re, Nilesh Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.
 
33
 
34
 
35
Prithviraj Sen and Amol Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
 
36
 
37
M. Soliman, I. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.
 
38
Christopher Réand Dan Suciu. Efficient evaluation of having queries on a probabilistic database. In DBPL, 2007.
 
39
Daisy Zhe Wang, Eirinaios Michelakis, Minos Garofalakis, and Joseph M. Hellerstein. BayesStore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, Auckland, New Zealand, 2008.
 
40
 
41
Y. Wakabayashi. The complexity of computing medians of relations. In Resenhas, volume 3(3), pages 323--349, 1998.
 
42
Xi Zhang and Jan Chomicki. On the semantics and evaluation of top-k queries in probabilistic databases. In DBRank, 2008.