| ROX: run-time optimization of XQueries |
| Full text |
Pdf
(660 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 35th SIGMOD international conference on Management of data
table of contents
Providence, Rhode Island, USA
SESSION: Research session 16: query processing on semi-structured data
table of contents
Pages 615-626
Year of Publication: 2009
ISBN:978-1-60558-551-2
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 32, Downloads (12 Months): 142, Citation Count: 0
|
|
|
ABSTRACT
Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated joins and selections. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute, materialize partial results, and use sampling based estimation techniques to observe the characteristics of intermediates. The proposed technique takes as input a Join Graph where the edges are either equi-joins or XPath steps, and the execution environment provides value- and structural-join algorithms, as well as structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. Our approach deals with these issues by limiting the run-time search space to so-called "zero-investment algorithms for which sampling can be guaranteed to be strictly linear in sample size. All operators and XML value indices used by ROX for sampling have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
Peter Boncz , Torsten Grust , Maurice van Keulen , Stefan Manegold , Jan Rittinger , Jens Teubner, MonetDB/XQuery: a fast XQuery processor powered by a relational engine, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142527]
|
 |
5
|
|
 |
6
|
|
 |
7
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, On random sampling over joins, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.263-274, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
8
|
Zhiyuan Chen , H. V. Jagadish , Flip Korn , Nick Koudas , S. Muthukrishnan , Raymond T. Ng , Divesh Srivastava, Counting Twig Matches in a Tree, Proceedings of the 17th International Conference on Data Engineering, p.595-604, April 02-06, 2001
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
D. Fisher and S. Maneth. Structural Selectivity Estimation for XML Documents. In ICDE, 2007.
|
 |
14
|
Juliana Freire , Jayant R. Haritsa , Maya Ramanath , Prasan Roy , Jérôme Siméon, StatiX: making XML count, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564713]
|
| |
15
|
|
 |
16
|
|
| |
17
|
T. Grust. Purely Relational FLWORs. In XIME-P, 2005
|
| |
18
|
T. Grust, M. Mayr, and J. Rittinger. XQuery Join Graph Isolation. In ICDE, 2009. arXiv:0810.4809.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
J. Hidders, P. Michiels, J. Siméon, and R. Vercammen. How To Recognize Different Kinds of Tree Patterns from Quite a Long Way Away. In PLAN-X, 2007.
|
 |
23
|
|
 |
24
|
|
 |
25
|
Volker Markl , Vijayshankar Raman , David Simmen , Guy Lohman , Hamid Pirahesh , Miso Cilimdzic, Robust query processing through progressive optimization, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007642]
|
| |
26
|
F. Olken and D. Rotem. Random Sampling from Databases -A Survey. Statistics and Computing, 5:25--42, 1995.
|
 |
27
|
Patrick O'Neil , Elizabeth O'Neil , Shankar Pal , Istvan Cseri , Gideon Schaller , Nigel Westbury, ORDPATHs: insert-friendly XML node labels, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007686]
|
 |
28
|
|
| |
29
|
|
| |
30
|
Wei Wang , Haifeng Jiang , Hongjun Lu , Jeffrey Xu Yu, Bloom histogram: path selectivity estimation for XML data with updates, Proceedings of the Thirtieth international conference on Very large data bases, p.240-251, August 31-September 03, 2004, Toronto, Canada
|
| |
31
|
|
|