ACM Home Page
Please provide us with feedback. Feedback
A heuristic approach for checking containment of generalized tree-pattern queries
Full text PdfPdf (873 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB/industry: XML data integration and XML query optimization table of contents
Pages 551-560  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Pawel Placek  NJIT, Newark, NJ, USA
Dimitri Theodoratos  NJIT, Newark, NJ, USA
Stefanos Souldatos  NTUA, Athens, Greece
Theodore Dalamagas  IMIS, Athens, Greece
Timos Sellis  IMIS and NTUA, Athens, Greece
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 82,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458155
What is a DOI?

ABSTRACT

Query processing techniques for XML data have focused mainly on tree-pattern queries (TPQs). However, the need for querying XML data sources whose structure is very complex or not fully known to the user, and the need to integrate multiple XML data sources with different structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. In order to implement the processing of such languages in current DBMSs, their containment problem has to be efficiently solved.

In this paper, we consider a query language which generalizes TPQs by allowing the partial specification of a tree pattern. Partial tree-pattern queries (PTPQs) constitute a large fragment of XPath that flexibly permits the specification of a broad range of queries from keyword queries without structure, to queries with partial specification of the structure, to complete TPQs. We address the containment problem for PTPQs. This problem becomes more complex in the context of PTPQs because the partial specification of the structure allows new, non-trivial, structural expressions to be inferred from those explicitly specified in a query. We show that the containent problem cannot be characterized by homomorphisms between PTPQs, even when PTPQs are put in a canonical form that comprises all derived structural expressions. We provide necessary and sufficient conditions for this problem in terms of homomorphisms between PTPQs and (a possibly exponential number of) TPQs. To cope with the high complexity of PTPQ containment, we suggest a heuristic approach for this problem that trades accuracy for speed. An extensive experimental evaluation of our heuristic shows that our heuristic approach can be efficiently implemented in a query optimizer.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
World Wide Web Consortium Site. (W3C), http://www.w3c.org
2
 
3
4
 
5
M. Benedikt and I. Fundulaki. XML subtree queries: Specification and composition. In DBPL, 2005.
6
 
7
 
8
 
9
A. Deutsch and V. Tannen. Containment and integrity constraints for xpath. In KRDB, 2001.
 
10
 
11
12
 
13
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword Proximity Search on XML Graphs. In ICDE, pages 367--378, 2003.
14
 
15
 
16
 
17
18
19
 
20
21
 
22
23
24
25
 
26
27
 
28
29
 
30
 
31
P. T. Wood. Minimising Simple XPath Expressions. In WebDB, pages 13--18, 2001.
 
32

Collaborative Colleagues:
Pawel Placek: colleagues
Dimitri Theodoratos: colleagues
Stefanos Souldatos: colleagues
Theodore Dalamagas: colleagues
Timos Sellis: colleagues