ACM Home Page
Please provide us with feedback. Feedback
Exploiting shared correlations in probabilistic databases
Full text PdfPdf (447 KB)
Source
Proceedings of the VLDB Endowment archive
Volume 1 ,  Issue 1  (August 2008) table of contents
SESSION: Uncertian databases table of contents
Pages 809-820  
Year of Publication: 2008
ISSN:2150-8097
Authors
Prithviraj Sen  University of Maryland, College Park, MD
Amol Deshpande  University of Maryland, College Park, MD
Lise Getoor  University of Maryland, College Park, MD
Publisher
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 56,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1453856.1453944
What is a DOI?

ABSTRACT

There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources --- from sensor data, experimental data, data from uncurated sources, and many others. There is a growing need for database management systems that can efficiently represent and query such data. In this work, we show how data characteristics can be leveraged to make the query evaluation process more efficient. In particular, we exploit what we refer to as shared correlations where the same uncertainties and correlations occur repeatedly in the data. Shared correlations occur mainly due to two reasons: (1) Uncertainty and correlations usually come from general statistics and rarely vary on a tuple-to-tuple basis; (2) The query evaluation procedure itself tends to re-introduce the same correlations. Prior work has shown that the query evaluation problem on probabilistic databases is equivalent to a probabilistic inference problem on an appropriately constructed probabilistic graphical model (PGM). We leverage this by introducing a new data structure, called the random variable elimination graph (rv-elim graph) that can be built from the PGM obtained from query evaluation. We develop techniques based on bisimulation that can be used to compress the rv-elim graph exploiting the presence of shared correlations in the PGM, the compressed rv-elim graph can then be used to run inference. We validate our methods by evaluating them empirically and show that even with a few shared correlations significant speed-ups are possible.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
5
 
6
 
7
A. Das Sarma, M. Theobald, and J. Widom. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In ICDE, 2008.
 
8
R. de Salvo Braz, E. Amir, and D. Roth. Lifted first-order probabilistic inference. In International Joint Conference on Artificial Inteligence, 2005.
 
9
 
10
 
11
12
 
13
L. Getoor, N. Friedman, D. Koller, and B. Taskar. Learning probabilistic models with link uncertainty. Journal of Machine Learning Research, 2002.
 
14
 
15
 
16
C. Huang and A. Darwiche. Inference in belief networks: A procedural guide. International Journal of Approximate Reasoning, 1994.
17
 
18
U. Kjaerulff. Triangulation of graphs - algorithms giving small total state space. Technical report, University of Aalborg, Denmark, 1990.
 
19
 
20
D. Poole. First-order probabilistic inference. In International Joint Conference on Artificial Inteligence, 2003.
 
21
C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.
 
22
 
23
 
24
P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
 
25
S. Singh, C. Mayfield, S. Prabhakar, S. Hambrusch, and R. Shah. Indexing uncertain categorical data. In ICDE, 2007.
 
26
S. Singh, C. Mayfield, R. Shah, S. Prabhakar, S. Hambrusch, J. Neville, and R. Cheng. Database support for probabilistic attributes and tuples. In ICDE, 2008.
 
27
N. L. Zhang and D. Poole. A simple approach to bayesian network computations. In Canadian Conference on AI, 1994.


Collaborative Colleagues:
Prithviraj Sen: colleagues
Amol Deshpande: colleagues
Lise Getoor: colleagues