ACM Home Page
Please provide us with feedback. Feedback
Processing XML streams with deterministic automata and stream indexes
Full text PdfPdf (717 KB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 29 ,  Issue 4  (December 2004) table of contents
SECTION: Papers from the 2003 international conference on Database theory table of contents
Pages: 752 - 788  
Year of Publication: 2004
ISSN:0362-5915
Authors
Todd J. Green  University of Pennsylvania, Philadelphia, PA
Ashish Gupta  University of Washington, Seattle, WA
Gerome Miklau  University of Washington, Seattle, WA
Makoto Onizuka  NTT Cyber Space Laboratories, NTT Corporation, Kanagawa, Japan
Dan Suciu  University of Washington, Seattle, WA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 162,   Citation Count: 20
Additional Information:

appendices and supplements   abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1042046.1042051
What is a DOI?

APPENDICES and SUPPLEMENTS
Pdfgreen-appendix (188 KB)
Online Appendix to: Processing XML streams with deterministic automata and stream indexes


ABSTRACT

We consider the problem of evaluating a large number of XPath expressions on a stream of XML packets. We contribute two novel techniques. The first is to use a single Deterministic Finite Automaton (DFA). The contribution here is to show that the DFA can be used effectively for this problem: in our experiments we achieve a constant throughput, independently of the number of XPath expressions. The major issue is the size of the DFA, which, in theory, can be exponential in the number of XPath expressions. We provide a series of theoretical results and experimental evaluations that show that the lazy DFA has a small number of states, for all practical purposes. These results are of general interest in XPath processing, beyond stream processing. The second technique is the Streaming IndeX (SIX), which consists of adding a small amount of binary data to each XML packet that allows the query processor to achieve significant speedups. As an application of these techniques we describe the XML Toolkit (XMLTK), a collection of command-line tools providing highly scalable XML data processing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
ANDIS/ISO. 1998. C++ Standard. ANDIS/ISO, Geneva, Switzerland.
 
5
Avila-Campillo, I., Green, T. J., Gupta, A., Onizuka, M., Raven, D., and Suciu, D. 2002. XMLTK: An XML toolkit for scalable XML stream processing. In Proceedings of PLANX.
 
6
Borne, K. D. n.d. NASA's astronomical data center. ADC XML resource page. Available online at http://xml.gsfc.nasa.gov/.
 
7
 
8
 
9
10
 
11
 
12
Corp., M. n.d. DIME---direct Internet message encapsulation specification index page. IETF Internet draft. Available online at http://msdn.microsoft.com/webservices/understanding/gxa/default.aspx.
13
 
14
Diao, Y. and Franklin, M. 2003. Query processing for high-volume XML message brokering. In Proceedings of VLDB (Berlin, Germany).
 
15
 
16
Florescu, D., Hillary, C., Kossmann, D., P.Lucas, Riccardi, F., Westmann, T., Carey, M., Sundararajan, A., and Agrawal, G. 2003. The bea/xqrl streaming xquery processor. In Proceedings of VLDB (Berlin, Germany). 997--1008.
 
17
 
18
19
 
20
 
21
Gupta, A. K., Halevy, A. Y., and Suciu, D. 2002. View selection for XML stream processing. In Proceedings of the International Workshop on the Web and Database (Web DB). 83--88.
22
23
 
24
Higgins, D. G., Fuchs, R., Stoehr, P. J., and Cameron, G. N. 1992. The EMBL data library. Nucleic Acids Res. 20, 2071--2074.
 
25
 
26
 
27
 
28
Ley, M. n.d. Computer science bibliography (dblp). Available online at http://dblp.uni- trier.de.
29
 
30
Ludaescher, B., Mukhopadhyay, P., and Papakonstantinou, Y. 2002. A transducer-based XML query processor. In Proceedings of VLDB. 227--238.
 
31
 
32
33
34
35
 
36
 
37
38
 
39
Thierry-Mieg, J. and Durbin, R. 1992. Syntactic Definitions for the ACEDB Data Base Manager. Tech. rep. MRC-LMB xx.92. MRC Laboratory for Molecular Biology, Cambridge, U.K.
40
 
41
Watson, B. W. 1993. A taxonomy of finite automata construction algorithms. Computing Science report 93/43. University of Technology Eindhoven, Eindhoven, The Netherlands.
 
42

CITED BY  20

Collaborative Colleagues:
Todd J. Green: colleagues
Ashish Gupta: colleagues
Gerome Miklau: colleagues
Makoto Onizuka: colleagues
Dan Suciu: colleagues