ABSTRACT
We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently of the number of XPath expressions. The major issue is the size of the DFA, which, in theory, can be exponential in the number of XPath expressions. We provide a series of theoretical results and experimental evaluations that show that the lazy DFA has a small number of states, for all practical purposes. These results are of general interest in XPath processing, beyond stream processing. The second technique is the Streaming IndeX (SIX), which consists of adding a small amount of binary data to each XML packet that allows the query processor to achieve significant speedups. As an application of these techniques we describe the XML Toolkit (XMLTK), a collection of command-line tools providing highly scalable XML data processing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
ANDIS/ISO. 1998. C++ Standard. ANDIS/ISO, Geneva, Switzerland.
|
| |
5
|
Avila-Campillo, I., Green, T. J., Gupta, A., Onizuka, M., Raven, D., and Suciu, D. 2002. XMLTK: An XML toolkit for scalable XML stream processing. In Proceedings of PLANX.
|
| |
6
|
Borne, K. D. n.d. NASA's astronomical data center. ADC XML resource page. Available online at http://xml.gsfc.nasa.gov/.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
Corp., M. n.d. DIME---direct Internet message encapsulation specification index page. IETF Internet draft. Available online at http://msdn.microsoft.com/webservices/understanding/gxa/default.aspx.
|
 |
13
|
|
| |
14
|
Diao, Y. and Franklin, M. 2003. Query processing for high-volume XML message brokering. In Proceedings of VLDB (Berlin, Germany).
|
| |
15
|
|
| |
16
|
Florescu, D., Hillary, C., Kossmann, D., P.Lucas, Riccardi, F., Westmann, T., Carey, M., Sundararajan, A., and Agrawal, G. 2003. The bea/xqrl streaming xquery processor. In Proceedings of VLDB (Berlin, Germany). 997--1008.
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
Gupta, A. K., Halevy, A. Y., and Suciu, D. 2002. View selection for XML stream processing. In Proceedings of the International Workshop on the Web and Database (Web DB). 83--88.
|
 |
22
|
|
 |
23
|
|
| |
24
|
Higgins, D. G., Fuchs, R., Stoehr, P. J., and Cameron, G. N. 1992. The EMBL data library. Nucleic Acids Res. 20, 2071--2074.
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
Ley, M. n.d. Computer science bibliography (dblp). Available online at http://dblp.uni- trier.de.
|
 |
29
|
|
| |
30
|
Ludaescher, B., Mukhopadhyay, P., and Papakonstantinou, Y. 2002. A transducer-based XML query processor. In Proceedings of VLDB. 227--238.
|
| |
31
|
|
| |
32
|
|
 |
33
|
|
 |
34
|
|
 |
35
|
|
| |
36
|
|
| |
37
|
|
 |
38
|
|
| |
39
|
Thierry-Mieg, J. and Durbin, R. 1992. Syntactic Definitions for the ACEDB Data Base Manager. Tech. rep. MRC-LMB xx.92. MRC Laboratory for Molecular Biology, Cambridge, U.K.
|
 |
40
|
|
| |
41
|
Watson, B. W. 1993. A taxonomy of finite automata construction algorithms. Computing Science report 93/43. University of Technology Eindhoven, Eindhoven, The Netherlands.
|
| |
42
|
|
CITED BY 20
|
|
Fang Yu , Zhifeng Chen , Yanlei Diao , T. V. Lakshman , Randy H. Katz, Fast and memory-efficient regular expression matching for deep packet inspection, Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems, December 03-05, 2006, San Jose, California, USA
|
|
|
|
|
|
K. Selçuk Candan , Wang-Pin Hsiung , Songting Chen , Junichi Tatemura , Divyakant Agrawal, AFilter: adaptable XML filtering with prefix-caching suffix-clustering, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mingsheng Hong , Alan J. Demers , Johannes E. Gehrke , Christoph Koch , Mirek Riedewald , Walker M. White, Massively multi-query join processing in publish/subscribe systems, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|