ACM Home Page
Please provide us with feedback. Feedback
Management of probabilistic data: foundations and challenges
Full text MovMov (82:31),  PdfPdf (270 KB)
Source
Symposium on Principles of Database Systems archive
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Beijing, China
Pages: 1 - 12  
Year of Publication: 2007
ISBN:978-1-59593-685-1
Authors
Nilesh Dalvi  University of Washington
Dan Suciu  University of Washington
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 274,   Citation Count: 22
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1265530.1265531
What is a DOI?

ABSTRACT

Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as probabilities. We review the basic definitions of the probabilistic data model, present some fundamental theoretical result for query evaluation on probabilistic databases, and discuss several challenges, open problems, and research directions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, pages 1059--1068, 2006.
 
2
Ernest Adams. A Primer of Probability Logic. CSLI Publications, Stanford, California, 1998.
 
3
R. Ananthakrishna, S. Chaudhuri, and V. Ganti. Eliminating fuzzy duplicates in data warehouses. In VLDB, pages 586--597, 2002.
 
4
 
5
L. Antova, C. Koch, and D. Olteanu. 10^(10^6) worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.
 
6
L. Antova, C. Koch, and D. Olteanu. World-set decompositions: Expressiveness and efficient algorithms. In ICDT, pages 194--208, 2007.
 
7
 
8
 
9
 
10
 
11
G. Borriello and F. Zhao. World-Wide Sensor Web: 2006 UW-MSR Summer Institute Semiahmoo Resort, Blaine, WA, 2006. www.cs.washington.edu/mssi/2006/schedule.html.
 
12
13
 
14
T. Choudhury, M. Philipose, D. Wyatt, and J. Lester. Towards activity databases: Using sensors and statistical models to summarize people's lives. IEEE Data Eng. Bull, 29(1):49--58, March 2006.
 
15
W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWeb, pages 73--78, 2003.
 
16
 
17
 
18
 
19
N. Dalvi, G. Miklau, and D. Suciu. Asymptotic conditional probabilities for conjunctive queries. In ICDT, 2005.
 
20
N. Dalvi, Chris Re, and D. Suciu. Query evaluation on probabilistic databases. IEEE Data Engineering Bulletin, 29(1):25--31, 2006.
 
21
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, Toronto, Canada, 2004.
 
22
23
 
24
Nilesh Dalvi. Query evaluation on a database given by a random graph. In ICDT, pages 149--163, 2007.
25
 
26
27
28
 
29
A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, pages 588--599, 2004.
 
30
A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Using probabilistic models for data management in acquisitional environments. In CIDR, pages 317--328, 2005.
 
31
A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R. McCann, M. Sayyadian, and W. Shen. Community information management. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Data Management, 29(1):64--72, March 2006.
 
32
33
 
34
Ivan Felligi and Alan Sunter. A theory for record linkage. Journal of the American Statistical Society, 64:1183--1210, 1969.
35
36
 
37
 
38
Minos Garofalakis and Dan Suciu. Special issue on probabilistic data management. IEEE Data Engineering Bulletin, pages 1--72, 2006.
 
39
Lise Getoor. An introduction to probabilistic graphical models for relational data. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Data Management, 29(1):32--40, March 2006.
40
 
41
T. Green and V. Tannen. Models for incomplete and probabilistic information. IEEE Data Engineering Bulletin, 29(1):17--24, March 2006.
 
42
 
43
L. Gu, R. Baxter, D. Vickers, and C. Rainsford. Record linkage: Current practice and future directions. In CMIS Technical Report No. 03/83, 2003.
 
44
45
 
46
47
 
48
D. Heckerman. Tutorial on graphical models, June 2002.
49
 
50
E. Hung, L. Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, 2003.
51
 
52
T. S. Jayram, S. Kale, and E. Vee. Efficient aggregation algorithms for probabilistic data. In SODA, 2007.
 
53
T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1):40--48, 2006.
 
54
 
55
R. Karp and M. Luby. Monte-Carlo algorithms for enumeration and reliability problems. In Proceedings of the annual ACM symposium on Theory of computing, 1983.
56
57
 
58
D. Koller. Representation, reasoning, learning. Computers and Thought 2001 Award talk.
59
 
60
J. Lester, T. Choudhury, N. Kern, G. Borriello, and B. Hannaford. A hybrid discriminative/generative approach for modeling human activities. In IJCAI, pages 766--772, 2005.
 
61
J. Madhavan, S. Cohen, X. Dong, A. Halevy, S. Jeffery, D. Ko, and C. Yu. Web-scale data integration: You can afford to pay as you go. In CIDR, pages 342--350, 2007.
62
 
63
Radford Neal. Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRG-TR-93-1, Univ. of Toronto, 1993.
 
64
Christos Papadimitriou. Computational Complexity. Addison Wesley Publishing Company, 1994.
 
65
 
66
S. Philippi and J. Kohler. Addressing the problems with life-science databases for traditional uses and systems biology. Nature Reviews Genetics, 7:481--488, June 2006.
 
67
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput., 12(4):777--788, 1983.
 
68
C. Re, N. Dalvi, and D. Suciu. Efficient Top-k query evaluation on probabilistic data. In ICDE, 2007.
 
69
Christopher Ré. Applications of probabilistic constraints. Technical Reprot TR2007-03-03, University of Washington, Seattle, Washington, March 2007.
70
 
71
Sunita Sarawagi. Automation in information extraction and data integration. Tutorial presented at VLDB'2002.
 
72
Prithviraj Sen and Amol Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
 
73
W. Shen, X. Li, and A. Doan. Constraint-based entity matching. In AAAI, pages 862--867, 2005.
74
 
75
L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comput., 8:410--421, 1979.
 
76
77
 
78
79
 
80
William Winkler. The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Bureau of the Census, 1999.
 
81
Y. Zabiyaka and A. Darwiche. Functional treewidth: Bounding complexity in the presence of functional dependencies. In SAT, pages 116--129, 2006.
 
82
alonhalevy.blogspot.com/2007_01_01_archive.html.
 
83
www.flickr.com.
 
84
base.google.com.
 
85

CITED BY  22