|
ABSTRACT
XML filtering systems aim to provide fast, on-the-fly matching of XML-encoded data to large numbers of query specifications containing constraints on both structure and content. It is now well accepted that approaches using event-based parsing and Finite State Machines (FSMs) can provide the basis for highly scalable structure-oriented XML filtering systems. The XFilter system [Altinel and Franklin 2000] was the first published FSM-based XML filtering approach. XFilter used a separate FSM per path query and a novel indexing mechanism to allow all of the FSMs to be executed simultaneously during the processing of a document. Building on the insights of the XFilter work, we describe a new method, called "YFilter" that combines all of the path queries into a single Nondeterministic Finite Automaton (NFA). YFilter exploits commonality among queries by merging common prefixes of the query paths such that they are processed at most once. The resulting shared processing provides tremendous improvements in structure matching performance but complicates the handling of value-based predicates.In this article, we first describe the XFilter and YFilter approaches and present results of a detailed performance comparison of structure matching for these algorithms as well as a hybrid approach. The results show that the path sharing employed by YFilter can provide order-of-magnitude performance benefits. We then propose two alternative techniques for extending YFilter's shared structure matching with support for value-based predicates, and compare the performance of these two techniques. The results of this latter study demonstrate some key differences between shared XML filtering and traditional database query processing. Finally, we describe how the YFilter approach is extended to handle more complicated queries containing nested path expressions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Demet Aksoy , Mehmet Altinel , Rahul Bose , Ugur Çetintemel , Michael J. Franklin , Jane Wang , Stanley B. Zdonik, Research in Data Broadcast and Dissemination, Proceedings of the First International Conference on Advanced Multimedia Content Processing, p.194-207, November 09-11, 1998
|
 |
2
|
Mehmet Altinel , Demet Aksoy , Thomas Baby , Michael Franklin , William Shapiro , Stan Zdonik, DBIS-toolkit: adaptable middleware for large scale data delivery, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.544-546, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
3
|
|
| |
4
|
Apache XML Project. 1999. Xerces Java parser 1.2.3 Release. http://xml.apache.org/xerces-j/index.html.
|
 |
5
|
|
| |
6
|
Bruno, N., Gravano, L., Koudas, N., and Srivastava, D. 2003. Navigation- vs. index-based XML multi-query processing. In Proceedings of ICDE 2003. IEEE Computer Society Press, Los Alamitos, Calif.
|
 |
7
|
|
| |
8
|
Busse, R., Carey, M., Florescu, D., Kersten, M., Manolescu, I., Schmidt, A., and Waas, F. 2001. Xmark: An XML benchmark project. http://monetdb.cwi.nl/xml/index.html.
|
| |
9
|
|
| |
10
|
Chamberlin, D., Fankhauser, P., Florescu, D., Marchiori, M., and Robie, J. 2002. XML query use cases. W3C Working Draft. http://www.w3.org/TR/2002/WD-xmlquery-use-cases-20020430.
|
| |
11
|
|
| |
12
|
|
 |
13
|
Jianjun Chen , David J. DeWitt , Feng Tian , Yuan Wang, NiagaraCQ: a scalable continuous query system for Internet databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.379-390, May 15-18, 2000, Dallas, Texas, United States
|
| |
14
|
Clark, J. 1999. XSL transformations (XSLT)---version 1.0. http://www.w3.org/TR/xslt.
|
| |
15
|
Clark, J., and DeRose, S. 1999. XML path language (XPath)---version 1.0. http://www.w3.org/TR/xpath.
|
| |
16
|
Cover, R. 1999. The SGML/XML Web Page. http://www.w3.org/TR/xslt.
|
| |
17
|
DeRose, S., Daniel Jr., R., and Maler, E. 1999. XML pointer language (XPointer). http://www.w3.org/TR/WD-xptr.
|
| |
18
|
Diaz, A. L., and Lovell, D. 1999. XML generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.
|
 |
19
|
Françoise Fabret , H. Arno Jacobsen , François Llirbat , Joăo Pereira , Kenneth A. Ross , Dennis Shasha, Filtering algorithms and implementation for very fast publish/subscribe systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.115-126, May 21-24, 2001, Santa Barbara, California, United States
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Ives, Z., Levy, A., and Weld, D. 2000. Efficient evaluation of regular path expressions on streaming XML data. Technical Report, University of Washington, Seattle, Wash.
|
| |
26
|
Kay, M. 2001. Saxon: the XSLT processor. http://users.iclway.co.uk/mhkay/saxon/.
|
| |
27
|
|
| |
28
|
Ley, M. 2001. DBLP DTD. http://www.acm.org/sigmod/dblp/db/about/dblp.dtd.
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
 |
32
|
Benjamin Nguyen , Serge Abiteboul , Grégory Cobena , Mihaí Preda, Monitoring XML data on the Web, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.437-448, May 21-24, 2001, Santa Barbara, California, United States
|
 |
33
|
Bahattin Ozen , Ozgur Kilic , Mehmet Altinel , Asuman Dogac, Highly personalized information delivery to mobile clients, Proceedings of the 2nd ACM international workshop on Data engineering for wireless and mobile access, p.35-42, May 2001, Santa Barbara, California, United States
[doi> 10.1145/376868.376891]
|
| |
34
|
|
| |
35
|
|
| |
36
|
Sax Project Organization. 2001. SAX: Simple API for XML. http://www. saxproject.org.
|
| |
37
|
|
 |
38
|
Michael Stonebraker , Anant Jhingran , Jeffrey Goh , Spyros Potamianos, On rules, procedure, caching and views in data base systems, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.281-290, May 23-26, 1990, Atlantic City, New Jersey, United States
|
| |
39
|
Sun Microsystems, Inc. 2001. Java XML pack. Winter 01 update release. http://java.sun.com/xml/downloads/javaxmlpack.html.
|
 |
40
|
Douglas Terry , David Goldberg , David Nichols , Brian Oki, Continuous queries over append-only databases, Proceedings of the 1992 ACM SIGMOD international conference on Management of data, p.321-330, June 02-05, 1992, San Diego, California, United States
|
| |
41
|
|
 |
42
|
|
| |
43
|
Wutka. 2000. DTD parser. http://www.wutka.com/dtdparser.html.
|
 |
44
|
|
 |
45
|
|
CITED BY 46
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fang Yu , Zhifeng Chen , Yanlei Diao , T. V. Lakshman , Randy H. Katz, Fast and memory-efficient regular expression matching for deep packet inspection, Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems, December 03-05, 2006, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Praveen Rao , Justin Cappos , Varun Khare , Bongki Moon , Beichuan Zhang, Net-χ: unified data-centric internet services, Proceedings of the 3rd USENIX international workshop on Networking meets databases, p.1-6, April 10, 2007, Cambridge, MA
|
|
|
|
|
|
Song Wang , Hong Su , Ming Li , Mingzhu Wei , Shoushen Yang , Drew Ditto , Elke A. Rundensteiner , Murali Mani, R-SOX: runtime semantic query optimization over XML streams, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
K. Selçuk Candan , Wang-Pin Hsiung , Songting Chen , Junichi Tatemura , Divyakant Agrawal, AFilter: adaptable XML filtering with prefix-caching suffix-clustering, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mingsheng Hong , Alan J. Demers , Johannes E. Gehrke , Christoph Koch , Mirek Riedewald , Walker M. White, Massively multi-query join processing in publish/subscribe systems, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
John Keeney , Dominik Roblek , Dominic Jones , David Lewis , Declan O'Sullivan, Extending Siena to support more expressive and flexible subscriptions, Proceedings of the second international conference on Distributed event-based systems, July 01-04, 2008, Rome, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Junichi Tatemura , Oliver Po , Arsany Sawires , Divyakant Agrawal , K. Selçuk Candan, WReX: a scalable middleware architecture to enable XML caching for web-services, Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware, p.124-143, November 01-01, 2005, Grenoble, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|