|
ABSTRACT
Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as probabilities. We review the basic definitions of the probabilistic data model and present some fundamental theoretical results for query evaluation on probabilistic databases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, pages 1059--1068, 2006.
|
| |
2
|
E. Adams. A Primer of Probability Logic. CSLI Publications, Stanford, California, 1998.
|
| |
3
|
|
| |
4
|
L. Antova, C. Koch, and D. Olteanu. 10^(10^6) worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.
|
| |
5
|
L. Antova, C. Koch, and D. Olteanu. World-set decompositions: Expressiveness and efficient algorithms. In ICDT, pages 194--208, 2007.
|
| |
6
|
|
| |
7
|
|
| |
8
|
G. Borriello and F. Zhao. World-Wide Sensor Web: 2006 UWMSR Summer Institute Semiahmoo Resort, Blaine, WA, 2006. www.cs.washington.edu/mssi/2006/schedule.html.
|
| |
9
|
T. Choudhury, M. Philipose, D. Wyatt, and J. Lester. Towards activity databases: Using sensors and statistical models to summarize people's lives. IEEE Data Eng. Bull, 29(1):49--58, March 2006.
|
 |
10
|
|
| |
11
|
|
| |
12
|
Robert G. Cowell , Steffen L. Lauritzen , A. Philip David , David J. Spiegelhalter , V. Nair , J. Lawless , M. Jordan, Probabilistic Networks and Expert Systems, Springer-Verlag New York, Inc., Secaucus, NJ, 1999
|
| |
13
|
|
| |
14
|
N. Dalvi, C. Re, and D. Suciu. Query evaluation on probabilistic databases. IEEE Data Engineering Bulletin, 29(1):25--31, 2006.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
Amol Deshpande , Carlos Guestrin , Samuel R. Madden , Joseph M. Hellerstein , Wei Hong, Model-driven data acquisition in sensor networks, Proceedings of the Thirtieth international conference on Very large data bases, p.588-599, August 31-September 03, 2004, Toronto, Canada
|
| |
20
|
A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Using probabilistic models for data management in acquisitional environments. In CIDR, pages 317--328, 2005.
|
| |
21
|
|
| |
22
|
A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Data Management, 29(1):64--72, March 2006.
|
| |
23
|
Magdalena Balazinska , Amol Deshpande , Michael J. Franklin , Phillip B. Gibbons , Jim Gray , Mark Hansen , Michael Liebhold , Suman Nath , Alexander Szalay , Vincent Tao, Data Management in the Worldwide Sensor Web, IEEE Pervasive Computing, v.6 n.2, p.30-40, April 2007
[doi> 10.1109/MPRV.2007.27]
|
 |
24
|
Oren Etzioni , Michael Cafarella , Doug Downey , Stanley Kok , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Web-scale information extraction in knowitall: (preliminary results), Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988687]
|
 |
25
|
Erich Grädel , Yuri Gurevich , Colin Hirsch, The complexity of query reliability, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.227-234, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.295124]
|
 |
26
|
|
| |
27
|
T. Green and V. Tannen. Models for incomplete and probabilistic information. IEEE Data Engineering Bulletin, 29(1):17--24, March 2006.
|
| |
28
|
|
| |
29
|
|
| |
30
|
D. Heckerman, Tutorial on graphical models, June 2002.
|
| |
31
|
E. Hung, L. Getoor, and V. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, 2003.
|
 |
32
|
|
| |
33
|
T. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1):40--48, 2006.
|
| |
34
|
R. Karp and M. Luby. Monte-Carlo algorithms for enumeration and reliability problems. In Proceedings of the annual ACM symposium on Theory of computing, 1983.
|
 |
35
|
|
| |
36
|
G. Kuper, L. Libkin, and J. P. (Eds.). Constraint Databases. Springer, 2000.
|
| |
37
|
J. Lester, T. Choudhury, N. Kern, G. Borriello, and B. Hannaford. A hybrid discriminative/generative approach for modeling human activities. In IJCAI, pages 766--772, 2005.
|
| |
38
|
C. Papadimitriou. Computational Complexity. Addison Wesley Publishing Company, 1994.
|
| |
39
|
S. Philippi and J. Kohler. Addressing the problems with life-science databases for traditional uses and systems biology. Nature Reviews Genetics, 7:481--488, June 2006.
|
| |
40
|
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777--788, 1983.
|
| |
41
|
C. Re, N. Dalvi, and D. Suciu. Efficient Top-k query evaluation on probabilistic data. In ICDE, 2007.
|
| |
42
|
|
| |
43
|
C. Re and D. Suciu. Approximate lineage for probabilistic databases. Technical Report 2008-03-02, University of Washington, Seattle, WA, 2008.
|
| |
44
|
|
 |
45
|
|
 |
46
|
|
| |
47
|
L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410--421, 1979.
|
| |
48
|
|
 |
49
|
|
|