| Top-k queries on uncertain data: on score distribution and typical answers |
| Full text |
Pdf
(631 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 35th SIGMOD international conference on Management of data
table of contents
Providence, Rhode Island, USA
SESSION: Research session 10: probabilistic databases I
table of contents
Pages 375-388
Year of Publication: 2009
ISBN:978-1-60558-551-2
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 61, Downloads (12 Months): 202, Citation Count: 0
|
|
|
ABSTRACT
Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
Ronald Fagin , Ravi Kumar , Mohammad Mahdian , D. Sivakumar , Erik Vee, Comparing and aggregating rankings with ties, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, France
[doi> 10.1145/1055558.1055568]
|
| |
8
|
R. Hassin and A. Tamir. Improved complexity bounds for location problems on the real line. In Operations Research Letters, 1991.
|
 |
9
|
|
 |
10
|
Bret Hull , Vladimir Bychkovsky , Yang Zhang , Kevin Chen , Michel Goraczko , Allen Miu , Eugene Shih , Hari Balakrishnan , Samuel Madden, CarTel: a distributed mobile sensor computing system, Proceedings of the 4th international conference on Embedded networked sensor systems, October 31-November 03, 2006, Boulder, Colorado, USA
[doi> 10.1145/1182807.1182821]
|
 |
11
|
|
 |
12
|
Ravi Jampani , Fei Xu , Mingxi Wu , Luis Leopoldo Perez , Christopher Jermaine , Peter J. Haas, MCDB: a monte carlo approach to managing uncertain data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
[doi> 10.1145/1376616.1376686]
|
| |
13
|
|
| |
14
|
S. Lim, H. Balakrishnan, D. Gifford, S. Madden, D. Rus. Stochastic Motion Planning and Applications to Traffic. In WAFR, 2008.
|
| |
15
|
C. Re, N. Dalvi and D. Suciu. Efficient Top-k Query Evaluation on Probabilistic Data. In ICDE, 2007.
|
| |
16
|
K. Rosen. Discrete Mathematics and Its Applications. 1995.
|
| |
17
|
P. Sen and A. Deshpande. Representing and Querying Correlated Tuples in Probabilistic Databases. In ICDE, 2007.
|
| |
18
|
M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007.
|
 |
19
|
Nesime Tatbul , Mark Buller , Reed Hoyt , Steve Mullen , Stan Zdonik, Confidence-based data management for personal area sensor networks, Proceeedings of the 1st international workshop on Data management for sensor networks: in conjunction with VLDB 2004, August 30-30, 2004, Toronto, Canada
[doi> 10.1145/1052199.1052204]
|
| |
20
|
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
|
| |
21
|
|
| |
22
|
X. Zhang and J. Chomicki. On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases. In DBRank'08.
|
| |
23
|
The R Project for Statistical Computing: www.r-project.org.
|
| |
24
|
|
|