|
ABSTRACT
With large amounts of correlated probabilistic data being generated in a wide range of application domains including sensor networks, information extraction, event detection etc., effectively managing and querying them has become an important research direction. While there is an exhaustive body of literature on querying independent probabilistic data, supporting efficient queries over large-scale, correlated databases remains a challenge. In this paper, we develop efficient data structures and indexes for supporting inference and decision support queries over such databases. Our proposed hierarchical data structure is suitable both for in-memory and disk-resident databases. We represent the correlations in the probabilistic database using a junction tree over the tuple-existence or attribute-value random variables, and use tree partitioning techniques to build an index structure over it. We show how to efficiently answer inference and aggregation queries using such an index, resulting in orders of magnitude performance benefits in most cases. In addition, we develop novel algorithms for efficiently keeping the index structure up-to-date as changes (inserts, updates) are made to the probabilistic database. We present a comprehensive experimental study illustrating the benefits of our approach to query processing in probabilistic databases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
A. Berry, P. Heggernes, and Y. Villanger. A vertex incremental approach for maintaining chordality. Discrete Mathematics, 2006.
|
 |
5
|
|
| |
6
|
Reynold Cheng , Yuni Xia , Sunil Prabhakar , Rahul Shah , Jeffrey Scott Vitter, Efficient indexing methods for probabilistic threshold queries over uncertain data, Proceedings of the Thirtieth international conference on Very large data bases, p.876-887, August 31-September 03, 2004, Toronto, Canada
|
| |
7
|
A. Choi and A. Darwiche. Focusing generalizations of belief propagation on targeted queries. In AAAI, 2008.
|
| |
8
|
Robert G. Cowell , Steffen L. Lauritzen , A. Philip David , David J. Spiegelhalter , V. Nair , J. Lawless , M. Jordan, Probabilistic Networks and Expert Systems, Springer-Verlag New York, Inc., Secaucus, NJ, 1999
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
R. Dechter. Constraint Networks (Survey). John Wiley&Sons, 1992
|
| |
13
|
A. Deshpande, L. Getoor, and P. Sen. Graphical Models for Uncertain Data. Managing and Mining Uncertain Data. Charu Aggarwal ed., Springer, 2009.
|
| |
14
|
A. Deshpande, C. Guestrin, and S. Madden. Using probabilistic models for data management in acquisitional environments. In CIDR, 2005.
|
| |
15
|
|
| |
16
|
J. Finn and J. Frank. Optimal junction trees. In UAI, 1994.
|
| |
17
|
|
| |
18
|
C. Huang and A. Darwiche. Inference in belief networks: A procedural guide. Int. J. Approx. Reasoning, 1996.
|
 |
19
|
|
| |
20
|
T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull., 29(1), 2006.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
S. Kundu and J. Misra. A linear tree partitioning algorithm. SIAM J. Comput., 1977.
|
 |
25
|
|
| |
26
|
|
| |
27
|
R. Mateescu and R. Dechter. And/or cutset conditioning. In IJCAI, 2005.
|
| |
28
|
D. Patterson, L. Liao, D. Fox, and H. Kautz. Inferring high level behavior from low level sensors. In UBICOMP, 2003.
|
 |
29
|
|
| |
30
|
P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
|
| |
31
|
|
| |
32
|
S. Singh, C. Mayfield, S. Prabhakar, R. Shah, and S. E. Hambrusch. Indexing uncertain categorical data. In ICDE, 2007.
|
| |
33
|
Yufei Tao , Reynold Cheng , Xiaokui Xiao , Wang Kay Ngai , Ben Kao , Sunil Prabhakar, Indexing multi-dimensional uncertain data with arbitrary probability density functions, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
34
|
|
| |
35
|
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
|
| |
36
|
J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. In NIPS, 2000.
|
|