ACM Home Page
Please provide us with feedback. Feedback
XSKETCH synopses for XML data graphs
Full text PdfPdf (886 KB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 31 ,  Issue 3  (September 2006) table of contents
Pages: 1014 - 1063  
Year of Publication: 2006
ISSN:0362-5915
Authors
Neoklis Polyzotis  University of California, Santa Cruz, Santa Cruz, CA
Minos Garofalakis  Intel Research Berkeley, Berkeley, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 95,   Citation Count: 5
Additional Information:

appendices and supplements   abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1166074.1166082
What is a DOI?

APPENDICES and SUPPLEMENTS
Online appendix to designing mediation for context-aware applications. The appendix supports the information on page 1014.


ABSTRACT

Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation and branching through the label structure of the XML data graph, and (2) predicates on the values of specific path/branch nodes, in order to reach the desired data elements. Clearly, optimizing such queries requires approximating the result cardinality of the referenced paths and hence hinges on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over the base XML data. In this article, we introduce a novel approach to building and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation. Our proposed graph-synopsis model (termed XSketch) exploits localized graph stability and value-distribution summaries (e.g., histograms) to accurately approximate (in limited space) the path and branching distribution, as well as the complex correlation patterns that can exist between and across path structure and element values in the data graph. To the best of our knowledge, ours is the first work to address this timely problem in the most general setting of graph-structured XML data with values, and complex (branching) path expressions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
Boag, S., Chamberlin, D., Fernndez, M. F., Florescu, D., Robie, J., and Simon, J. 2005. XQuery 1.0: An XML query language. W3C Candidate Recommendation. Go online to www.w3.org.
 
5
Bray, T., Paoli, J., Sperberg-McQueen, C. M., and Maler, E. 2000. Extensible Markup Language (XML) 1.0 (second edition). W3C Recommendation. Go online to www.w3.org/.
6
7
 
8
 
9
Clark, J. 1999. XSL Transformations (XSLT), Version 1.0. W3C Recommendation. Go online to www.w3.org.
 
10
Clark, J. and DeRose, S. 1999. XML path language (XPath), version 1.0. W3C Recommendation. Go online to www.w3.org.
 
11
DeRose, S., Maler, E., and Orchard, D. 2001. XML linking language (XLink), version 1.0. W3C Recommendation. Go online to www.w3.org.
12
 
13
Feller, W. 1968. An Introduction to Probability Theory and its Applications---Volume I, John Wiley & Sons, New York, NY.
 
14
Fowler, R. J., Paterson, M. S., and Tanimoto, S. L. 1981. Optimal packing and covering in the plane are NP-complete. Inform. Process. Lett. 12, 3 (Jun.), 133--137.
 
15
Fox, B. 1966. Discrete Optimization Via Marginal Analysis. Manage. Sci. 13, 3, 211--216.
16
 
17
 
18
 
19
 
20
Gonzalez, T. F. 1985. Clustering to minimize the maximum intercluster distance. Theoret. Comput. Sci. 38, 293--306.
 
21
 
22
23
 
24
 
25
Lim, L., Wang, M., Padmanabhan, S., Vitter, J., and Parr, R. 2002. XPathLearner: An on-line self-tuning markov histogram for XML path selectivity estimation. In Proceedings of the 28th International Conference on Very Large Data Bases. 442--453.
 
26
Megiddo, N. and Supowit, K. J. 1984. On the complexity of some common geometric location problems. SIAM J. Comput. 13, 1, 182--196.
 
27
 
28
Naughton, J. F., DeWitt, D. J., and et al 2001. The niagara internet query system. IEEE Data Eng. Bull. 24, 2.
 
29
 
30
 
31
32
 
33
 
34
35
 
36
37
38
 
39
Wang, W., Jiang, H., Lu, H., and Yu, J. X. 2004. Bloom histogram: Path selectivity estimation for XML data with updates. In Proceedings of the 30th International Conference on Very Large Data Bases. 240--251.
 
40
 
41
Wu, Y., Patel, J. M., and Jagadish, H. 2003. Structural join order selection for XML query optimization. In Proceedings of the 19th International Conference on Data Engineering. 443--454.
 
42


Collaborative Colleagues:
Neoklis Polyzotis: colleagues
Minos Garofalakis: colleagues