ACM Home Page
Please provide us with feedback. Feedback
Holistic twig joins: optimal XML pattern matching
Full text PdfPdf (1.34 MB)
Source International Conference on Management of Data archive
Proceedings of the 2002 ACM SIGMOD international conference on Management of data table of contents
Madison, Wisconsin
SESSION: Research sessions: XML II table of contents
Pages: 310 - 321  
Year of Publication: 2002
ISBN:1-58113-497-5
Authors
Nicolas Bruno  Columbia University
Nick Koudas  AT&T Labs---Research
Divesh Srivastava  AT&T Labs---Research
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 210,   Citation Count: 151
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564691.564727
What is a DOI?

ABSTRACT

XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendant) relationships, and twig matching is achieved by: (i) using structural join algorithms to match the binary relationships against the XML database, and (ii) stitching together these basic matches. A limitation of this approach for matching twig patterns is that intermediate result sizes can get large, even when the input and output sizes are more manageable.In this paper, we propose a novel holistic twig join algorithm, TwigStack, for matching an XML query twig pattern. Our technique uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. When the twig pattern uses only ancestor-descendant relationships between elements, TwigStack is I/O and CPU optimal among all sequential algorithms that read the entire input: it is linear in the sum of sizes of the input lists and the final result list, but independent of the sizes of intermediate results. We then show how to use (a modification of) B-trees, along with TwigStack, to match query twig patterns in sub-linear time. Finally, we complement our analysis with experimental results on a range of real and synthetic data, and query twig patterns.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. Boag, D. Chamberlin, M. Fernandez, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu XQuery 1.0: An XML Query Language. W3C Working Draft. Available from http://www.w3.org/TR/xquery, Dec. 2001.
 
3
N. Bruno, N. Koudas, D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. Technical Report. Columbia University. March 2002.
 
4
 
5
6
7
 
8
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A query language for XML. Available from http://www.w3.org/TR/NOTE-xml-ql., 1998.
 
9
D. DeWitt, J. Naughton, and D. Schneider. An evaluation of non equijoin algorithms. Proceedings of ACM SIGMOD, 1991.
 
10
 
11
 
12
D. Florescu and D. Kossman. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 22(3):27-34, 1999.
13
14
15
16
 
17
 
18
U. of Washington. The Tukwila system. Available from http://data.cs.washington.edu/integration/tukwila/.
 
19
U. of Wisconsin. The Niagara system. Available from http://www.cs.wisc.edu/niagara/.
20
21
 
22
 
23
 
24
 
25
XMach-1. Available from http://dbs.uni-leipzig.de/en/projekte/XML/XmlBenchmarking.html.
 
26
The XML benchmark project. Available from http://www.xml-benchmark.org.
27

CITED BY  151

Collaborative Colleagues:
Nicolas Bruno: colleagues
Nick Koudas: colleagues
Divesh Srivastava: colleagues