ACM Home Page
Please provide us with feedback. Feedback
MCDB: a monte carlo approach to managing uncertain data
Full text PdfPdf (333 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 15: Probabilistic I table of contents
Pages 687-700  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Ravi Jampani  University of Florida, Gainesville, FL, USA
Fei Xu  University of Florida, Gainesville, FL, USA
Mingxi Wu  University of Florida, Gainesville, FL, USA
Luis Leopoldo Perez  University of Florida, Gainesville, FL, USA
Christopher Jermaine  University of Florida, Gainesville, FL, USA
Peter J. Haas  IBM Almaden Research Center, San Jose, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 118,   Downloads (12 Months): 539,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376686
What is a DOI?

ABSTRACT

To deal with data uncertainty, existing probabilistic database systems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system's ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized according to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via "VG functions," which are used to pseudorandomly generate realized values for uncertain attributes. VG functions can be parameterized on the results of SQL queries over "parameter tables" that are stored in the database, facilitating what-if analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly computing, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For example, MCDB can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, executing a query plan exactly once, but over "tuple bundles" instead of ordinary tuples. Experiments indicate that our enhanced functionality can be obtained with acceptable overheads relative to traditional systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
L. Antova, C. Koch, and D. Olteanu. 10106 worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, pages 606--615, 2007.
 
4
L. Antova, C. Koch, and D. Olteanu. MayBMS: Managing incomplete information with probabilistic world-set decompositions. In ICDE, pages 1479--1480, 2007.
5
 
6
 
7
 
8
9
 
10
 
11
12
 
13
L. Devroye. Non-Uniform Random Variate Generation. Springer, 1986.
 
14
L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer, 2001.
 
15
G. Fishman. Monte Carlo: Concepts, Algorithms, and Applications. Springer, 1996.
16
 
17
J. E. Gentle. Random Number Generation and Monte Carlo Methods. Springer, second edition, 2003.
 
18
 
19
20
 
21
S. G. Henderson and B. L. Nelson, editors. Simulation. North-Holland, 2006.
22
 
23
24
 
25
R. Murthy and J. Widom. Making aggregation work in uncertain and probabilistic databases. In Proc. 1st Int. VLDB Work. Mgmt. Uncertain Data (MUD), pages 76--90, 2007.
 
26
A. Nadas. An extension of a theorem by Chow and Robbins on sequential confidence intervals for the mean. Ann. Math. Statist., 40(2):667--671, 1969.
 
27
 
28
A. O'Hagan and J. J. Forster. Bayesian Inference. Volume 2B of Kendall's Advanced Theory of Statistics. Arnold, second edition, 2004.
 
29
 
30
C. Re, N. N. Dalvi, and D. Suciu. Query evaluation on probabilistic databases. IEEE Data Eng. Bull., 29(1):25--31, 2006.
 
31
C. Re, N. N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, pages 886--895, 2007.
 
32
 
33
P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, pages 596--605, 2007.
 
34
R. J. Serfling. Approximation Theorems of Mathematical Statistics. Wiley, 1980.
 
35
S. Singh, C. Mayfield, S. Prabhakar, R. Shah, and S. E. Hambrusch. Indexing uncertain categorical data. In ICDE, pages 616--625, 2007.
 
36
J. Xie, J. Yang, Y. Chen, H. Wang, and P. Yu. A sampling-based approach to information recovery. In ICDE, 2008. To appear.

CITED BY  8

Collaborative Colleagues:
Ravi Jampani: colleagues
Fei Xu: colleagues
Mingxi Wu: colleagues
Luis Leopoldo Perez: colleagues
Christopher Jermaine: colleagues
Peter J. Haas: colleagues