|
ABSTRACT
Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex spatio/temporal correlation patterns present in uncertain sensory data. Unfortunately, to date, most existing approaches to probabilistic database systems have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilistic information is typically associated with individual data tuples, with only limited or no support for effectively capturing and reasoning about complex data correlations. In this paper, we introduce BayesStore, a novel probabilistic data management architecture built on the principle of handling statistical models and probabilistic inference tools as first-class citizens of the database system. Adopting a machine-learning view, BAYESSTORE employs concise statistical relational models to effectively encode the correlation patterns between uncertain data, and promotes probabilistic inference and statistical model manipulation as part of the standard DBMS operator repertoire to support efficient and sound query processing. We present BAYESSTORE's uncertainty model based on a novel, first-order statistical model, and we redefine traditional query processing operators, to manipulate the data and the probabilistic models of the database in an efficient manner. Finally, we validate our approach, by demonstrating the value of exploiting data correlations during query processing, and by evaluating a number of optimizations which significantly accelerate query processing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008.
|
| |
2
|
L. Antova, D. Olteanu, and S. Scherzinger. 10(106) worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Robert G. Cowell , Steffen L. Lauritzen , A. Philip David , David J. Spiegelhalter , V. Nair , J. Lawless , M. Jordan, Probabilistic Networks and Expert Systems, Springer-Verlag New York, Inc., Secaucus, NJ, 1999
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
M. I. Jordan. Graphical models. Statistical Science (Special Issue on Bayesian Statistics), 19:140--155, 2004.
|
| |
14
|
E. Michelakis, D. Z. Wang, M. Garofalakis, and J. M. Hellerstein. Granularity conscious modeling for probabilistic databases. In DUNE, 2007.
|
| |
15
|
|
| |
16
|
D. Poole. First-order Probabilistic Inference. In IJCAI, 2003.
|
| |
17
|
P. Sen and A. Deshpande. Representing and Querying Correlated Tuples in Probabilistic Databases. In ICDE, 2007.
|
| |
18
|
P. Sen, A. Deshpande, and L. Getoor. Representing tuple and attribute uncertainty in probabilistic databases. In DUNE, 2007.
|
| |
19
|
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, 2002.
|
CITED BY 2
|
|
Eirinaios Michelakis , Rajasekar Krishnamurthy , Peter J. Haas , Shivakumar Vaithyanathan, Uncertainty management in rule-based information extraction systems, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|