ACM Home Page
Please provide us with feedback. Feedback
Top-k queries on uncertain data: on score distribution and typical answers
Full text PdfPdf (631 KB)
Source
International Conference on Management of Data archive
Proceedings of the 35th SIGMOD international conference on Management of data table of contents
Providence, Rhode Island, USA
SESSION: Research session 10: probabilistic databases I table of contents
Pages 375-388  
Year of Publication: 2009
ISBN:978-1-60558-551-2
Authors
Tingjian Ge  Brown University, Providence, RI, USA
Stan Zdonik  Brown University, Providence, RI, USA
Samuel Madden  Massachusetts Institute of Technology, Canbridge, MA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 61,   Downloads (12 Months): 202,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559845.1559886
What is a DOI?

ABSTRACT

Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
 
5
6
7
 
8
R. Hassin and A. Tamir. Improved complexity bounds for location problems on the real line. In Operations Research Letters, 1991.
9
10
11
12
 
13
 
14
S. Lim, H. Balakrishnan, D. Gifford, S. Madden, D. Rus. Stochastic Motion Planning and Applications to Traffic. In WAFR, 2008.
 
15
C. Re, N. Dalvi and D. Suciu. Efficient Top-k Query Evaluation on Probabilistic Data. In ICDE, 2007.
 
16
K. Rosen. Discrete Mathematics and Its Applications. 1995.
 
17
P. Sen and A. Deshpande. Representing and Querying Correlated Tuples in Probabilistic Databases. In ICDE, 2007.
 
18
M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007.
19
 
20
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
 
21
 
22
X. Zhang and J. Chomicki. On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases. In DBRank'08.
 
23
The R Project for Statistical Computing: www.r-project.org.
 
24

Collaborative Colleagues:
Tingjian Ge: colleagues
Stan Zdonik: colleagues
Samuel Madden: colleagues