ACM Home Page
Please provide us with feedback. Feedback
Incorporating constraints in probabilistic XML
Full text PdfPdf (808 KB)
Source
ACM Transactions on Database Systems (TODS) archive
Volume 34 ,  Issue 3  (August 2009) table of contents
Article No. 18  
Year of Publication: 2009
ISSN:0362-5915
Authors
Sara Cohen  The Hebrew University of Jerusalem, Jerusalem, Israel
Benny Kimelfeld  IBM Almaden Research Center, San Jose, CA
Yehoshua Sagiv  The Hebrew University of Jerusalem, Jerusalem, Israel
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 74,   Downloads (12 Months): 166,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1567274.1567280
What is a DOI?

ABSTRACT

Constraints are important, not only for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability subspace comprising the instances of a p-document that satisfy a set of constraints. In contrast to existing models that can express probabilistic dependencies, it is shown that query evaluation is tractable in PXDBs. The problems of sampling and determining well-definedness (i.e., whether the aforesaid subspace is nonempty) are also tractable. Furthermore, queries and constraints can include the aggregate functions count, max, min, and ratio. Finally, this approach can be easily extended to allow a probabilistic interpretation of constraints.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abiteboul, S., Kimelfeld, B., Sagiv, Y., and Senellart, P. 2009. On the expressiveness of probabilistic XML models. VLDB J.
 
2
Abiteboul, S. and Senellart, P. 2006. Querying and updating probabilistic information in XML. In Proceedings of the International Conference on Extending Database Technology (EDBT). Springer, 1059--1068.
 
3
Bidoit, N. and Colazzo, D. 2007. Testing XML constraint satisfiability. Electr. Notes Theor. Comput. Sci. 174, 6, 45--61.
 
4
Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, 310--321.
 
5
Buneman, P., Davidson, S. B., Fan, W., Hara, C. S., and Tan, W. C. 2002. Keys for XML. Comput. Netw. 39, 5, 473--487.
 
6
Cohen, S., Kimelfeld, B., and Sagiv, Y. 2008. Incorporating constraints in probabilistic XML. In Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 109--118.
 
7
Cohen, S., Kimelfeld, B., and Sagiv, Y. 2009. Running tree automata on probabilistic XML. In Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 227--236.
 
8
Cooper, G. F. 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42, 2-3, 393--405.
 
9
Dagum, P. and Luby, M. 1993. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artif. Intell. 60, 1, 141--153.
 
10
Dalvi, N. N. and Suciu, D. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the International Conference on Very Large Database (VLDB). Morgan Kaufmann, 864--875.
 
11
Dalvi, N. N. and Suciu, D. 2007. The dichotomy of conjunctive queries on probabilistic structures. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 293--302.
 
12
Fan, W., Kuper, G. M., and Siméon, J. 2002. A unified constraint model for XML. Comput. Netw. 39, 5, 489--505.
 
13
Fan, W. and Libkin, L. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406.
 
14
Fan, W. and Siméon, J. 2003. Integrity constraints for XML. J. Comput. Syst. Sci. 66, 1, 254-- 291.
 
15
Frick, M. and Grohe, M. 2002. The complexity of first-order and monadic second-order logic revisited. In Proceedings of the Annual IEEE Symposium on Logic in Computer Science (LICS). IEEE Computer Society, 215--224.
 
16
Hung, E., Getoor, L., and Subrahmanian, V. S. 2003a. Probabilistic interval XML. In Proceedings of the International Conference on Database Theory (ICDT). Springer, 361--377.
 
17
Hung, E., Getoor, L., and Subrahmanian, V. S. 2003b. PXML: A probabilistic semistructured data model and algebra. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 467--478.
 
18
Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2008. Query efficiency in probabilistic XML models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 701--714.
 
19
Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2009. Query evaluation over probabilistic XML. VLDB J.
 
20
Kimelfeld, B. and Sagiv, Y. 2007a. Matching twigs in probabilistic XML. In Proceedings of the International Conference on Very Large Databases (VLDB). ACM, 27--38.
 
21
Kimelfeld, B. and Sagiv, Y. 2007b. Maximally joining probabilistic data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 303--312.
 
22
Li, T., Shao, Q., and Chen, Y. 2006. PEPX: A query-friendly probabilistic XML database. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management. ACM Press, 848--849.
 
23
Neven, F. and Schwentick, T. 2002. Query automata over finite trees. Theor. Comput. Sci. 275, 1-2, 633--674.
 
24
Nierman, A. and Jagadish, H. V. 2002. ProTDB: Probabilistic data in XML. In Proceedings of the International Conference on Very Large Database (VLDB). ACM, 646--657.
 
25
Pearl, J. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings of the CogSci. Cognitive Science Society, University of California, Irvine, CA, 329--334.
 
26
Provan, J. S. and Ball, M. O. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12, 4, 777--788.
 
27
Re, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE, 886--895.
 
28
Re, C. and Suciu, D. 2007. Efficient evaluation of HAVING queries on a probabilistic database. In Proceedings of the International Conference on Database Programming Languages (DBPL). Springer, 186--200.
 
29
Senellart, P. and Abiteboul, S. 2007. On the complexity of managing probabilistic XML data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 283--292.
 
30
Tamaki, H. and Sato, T. 1986. OLD resolution with tabulation. In Proceedings of the International Conference on Logic Programming (ICLP). Springer, 84--98.
 
31
Toda, S. and Ogiwara, M. 1992. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21, 2, 316--328.
 
32
van Keulen, M., de Keijzer, A., and Alink, W. 2005. A probabilistic XML approach to data integration. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 459--470.
 
33
Warren, D. S. 1992. Memoing for logic programs. Comm. ACM 35, 3, 93--111.