|
ABSTRACT
XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendant) relationships, and twig matching is achieved by: (i) using structural join algorithms to match the binary relationships against the XML database, and (ii) stitching together these basic matches. A limitation of this approach for matching twig patterns is that intermediate result sizes can get large, even when the input and output sizes are more manageable.In this paper, we propose a novel holistic twig join algorithm, TwigStack, for matching an XML query twig pattern. Our technique uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. When the twig pattern uses only ancestor-descendant relationships between elements, TwigStack is I/O and CPU optimal among all sequential algorithms that read the entire input: it is linear in the sum of sizes of the input lists and the final result list, but independent of the sizes of intermediate results. We then show how to use (a modification of) B-trees, along with TwigStack, to match query twig patterns in sub-linear time. Finally, we complement our analysis with experimental results on a range of real and synthetic data, and query twig patterns.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
S. Boag, D. Chamberlin, M. Fernandez, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu XQuery 1.0: An XML Query Language. W3C Working Draft. Available from http://www.w3.org/TR/xquery, Dec. 2001.
|
| |
3
|
N. Bruno, N. Koudas, D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. Technical Report. Columbia University. March 2002.
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A query language for XML. Available from http://www.w3.org/TR/NOTE-xml-ql., 1998.
|
| |
9
|
D. DeWitt, J. Naughton, and D. Schneider. An evaluation of non equijoin algorithms. Proceedings of ACM SIGMOD, 1991.
|
| |
10
|
|
| |
11
|
|
| |
12
|
D. Florescu and D. Kossman. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 22(3):27-34, 1999.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
U. of Washington. The Tukwila system. Available from http://data.cs.washington.edu/integration/tukwila/.
|
| |
19
|
U. of Wisconsin. The Niagara system. Available from http://www.cs.wisc.edu/niagara/.
|
 |
20
|
|
 |
21
|
Dallan Quass , Jennifer Widom , Roy Goldman , Kevin Haas , Qingshan Luo , Jason McHugh , Svetlozar Nestorov , Anand Rajaraman , Hugo Rivero , Serge Abiteboul , Jeff Ullman , Janet Wiener, LORE: a Lightweight Object REpository for semistructured data, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.549, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
22
|
|
| |
23
|
Jayavel Shanmugasundaram , Eugene J. Shekita , Rimon Barr , Michael J. Carey , Bruce G. Lindsay , Hamid Pirahesh , Berthold Reinwald, Efficiently Publishing Relational Data as XML Documents, Proceedings of the 26th International Conference on Very Large Data Bases, p.65-76, September 10-14, 2000
|
| |
24
|
Jayavel Shanmugasundaram , Kristin Tufte , Chun Zhang , Gang He , David J. DeWitt , Jeffrey F. Naughton, Relational Databases for Querying XML Documents: Limitations and Opportunities, Proceedings of the 25th International Conference on Very Large Data Bases, p.302-314, September 07-10, 1999
|
| |
25
|
XMach-1. Available from http://dbs.uni-leipzig.de/en/projekte/XML/XmlBenchmarking.html.
|
| |
26
|
The XML benchmark project. Available from http://www.xml-benchmark.org.
|
 |
27
|
Chun Zhang , Jeffrey Naughton , David DeWitt , Qiong Luo , Guy Lohman, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.425-436, May 21-24, 2001, Santa Barbara, California, United States
|
CITED BY 151
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Beverly Yang , Marcus Fontoura , Eugene Shekita , Sridhar Rajagopalan , Kevin Beyer, Virtual cursors for XML joins, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
Yi Chen , George A. Mihaila , Susan B. Davidson , Sriram Padmanabhan, EXPedite: a system for encoded XML processing, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
H. V. Jagadish , Laks V. S. Lakshmanan , Monica Scannapieco , Divesh Srivastava , Nuwee Wiwatwattana, Colorful XML: one hierarchy isn't enough, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ning Zhang , Peter J. Haas , Vanja Josifovski , Guy M. Lohman , Chun Zhang, Statistical learning techniques for costing XML queries, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Beyer , Roberta J. Cochrane , Vanja Josifovski , Jim Kleewein , George Lapis , Guy Lohman , Bob Lyle , Fatma Özcan , Hamid Pirahesh , Normen Seemann , Tuong Truong , Bert Van der Linden , Brian Vickery , Chun Zhang, System RX: one part relational, one part XML, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Wang , Haifeng Jiang , Hongzhi Wang , Xuemin Lin , Hongjun Lu , Jianzhong Li, Efficient processing of XML path queries using the disk-based F&B Index, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
Barbara Catania , Beng Chin Ooi , Wenqiang Wang , Xiaoling Wang, Lazy XML updates: laziness as a virtue, of update and structural join efficiency, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
William M. Shui , Franky Lam , Damien K. Fisher , Raymond K. Wong, Querying and maintaining ordered XML data using relational databases, Proceedings of the sixteenth Australasian database conference, p.85-94, January 01, 2005, Newcastle, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. Balmin , T. Eliaz , J. Hornibrook , L. Lim , G. M. Lohman , D. Simmen , M. Wang , C. Zhang, Cost-based optimization in DB2 XML, IBM Systems Journal, v.45 n.2, p.299-319, January 2006
|
|
|
K. Beyer , R. Cochrane , M. Hvizdos , V. Josifovski , J. Kleewein , G. Lapis , G. Lohman , R. Lyle , M. Nicola , F. Özcan , H. Pirahesh , N. Seemann , A. Singh , T. Truong , R. C. Van der Linden , B. Vickery , C. Zhang , G. Zhang, DB2 goes hybrid: integratng native XML and XQuery with relational data and SQL, IBM Systems Journal, v.45 n.2, p.271-298, January 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaoying Wu , Stefanos Souldatos , Dimitri Theodoratos , Theodore Dalamagas , Timos Sellis, Efficient evaluation of generalized path pattern queries on XML data, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
Peter Boncz , Torsten Grust , Maurice van Keulen , Stefan Manegold , Jan Rittinger , Jens Teubner, MonetDB/XQuery: a fast XQuery processor powered by a relational engine, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
K. Selçuk Candan , Wang-Pin Hsiung , Songting Chen , Junichi Tatemura , Divyakant Agrawal, AFilter: adaptable XML filtering with prefix-caching suffix-clustering, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
Songting Chen , Hua-Gang Li , Junichi Tatemura , Wang-Pin Hsiung , Divyakant Agrawal , K. Selçuk Candan, Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
François Bry , Tim Furche , Benedikt Linse , Andreas Schroeder, Efficient evaluation of n-ary conjunctive queries over trees and graphs, Proceedings of the eighth ACM international workshop on Web information and data management, November 10-10, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alan Halverson , Josef Burger , Leonidas Galanis , Ameet Kini , Rajasekar Krishnamurthy , Ajith Nagaraja Rao , Feng Tian , Stratis D. Viglas , Yuan Wang , Jeffrey F. Naughton , David J. DeWitt, Mixed mode XML query processing, Proceedings of the 29th international conference on Very large data bases, p.225-236, September 09-12, 2003, Berlin, Germany
|
|
|
Zhimin Chen , H. V. Jagadish , Laks V. S. Lakshmanan , Stelios Paparizos, From tree patterns to generalized tree patterns: on efficient evaluation of XQuery, Proceedings of the 29th international conference on Very large data bases, p.237-248, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
Haifeng Jiang , Wei Wang , Hongjun Lu , Jeffrey Xu Yu, Holistic twig joins on indexed XML documents, Proceedings of the 29th international conference on Very large data bases, p.273-284, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
|
|
|
|
|
|
Shu-Yao Chien , Zografoula Vagena , Donghui Zhang , Vassilis J. Tsotras , Carlo Zaniolo, Efficient structural joins on indexed XML documents, Proceedings of the 28th international conference on Very Large Data Bases, p.263-274, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stefanos Souldatos , Xiaoying Wu , Dimitri Theodoratos , Theodore Dalamagas , Timos Sellis, Evaluation of partial path queries on xml data, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiangming Yang , Haixun Wang , Ning Gu , Yiming Liu , Chunsong Wang , Qiwei Zhang, Lock-free consistency control for web 2.0 applications, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Cheng Luo , Zhewei Jiang , Wen-Chi Hou , Feng Yu , Qiang Zhu, A sampling approach for XML query selectivity estimation, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
K. Selçuk Candan , Mehmet E. Dönderler , Yan Qi , Jaikannan Ramamoorthy , Jong W. Kim, FMware: middleware for efficient filtering and matching of XML messages with local data, Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware, November 01-01, 2006, Melbourne, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Riham Abdel Kader , Peter Boncz , Stefan Manegold , Maurice van Keulen, ROX: run-time optimization of XQueries, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
George H. L. Fletcher , Dirk Van Gucht , Yuqing Wu , Marc Gyssens , Sofía Brenes , Jan Paredaens, A methodology for coupling fragments of XPath with structural indexes for XML documents, Information Systems, v.34 n.7, p.657-670, November, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Yongjian Wang , Yinan Ren , Ting Chen , Yuanqiang Huang , Zhongzhi Luan , Zhongxin Wu , Depei Qian, Cesar-FD: An Effective Stateful Fault Detection Mechanism in Drug Discovery Grid, Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, p.580-585, May 18-21, 2009
|
|