ACM Home Page
Please provide us with feedback. Feedback
A high-performance interpretive approach to schema-directed parsing
Full text PdfPdf (229 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
SESSION: Parsing, normalizing, & storing XML table of contents
Pages: 1093 - 1114  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Morris Matsa  IBM Corporation
Eric Perkins  IBM Corporation
Abraham Heifets  IBM Corporation
Margaret Gaitatzes Kostoulas  IBM Corporation
Daniel Silva  IBM Corporation
Noah Mendelsohn  IBM Corporation
Michelle Leger  IBM Corporation
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 54,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242719
What is a DOI?

ABSTRACT

XML delivers key advantages in interoperability due to its flexibility, expressiveness, and platform-neutrality. As XML has become a performance-critical aspect of the next generation of business computing infrastructure, however, it has become increasingly clear that XML parsing often carries a heavy performance penalty, and that current, widely-used parsing technologies are unable to meet the performance demands of an XML-based computing infrastructure. Several efforts have been made to address this performance gap through the use of grammar-based parser generation. While the performance of generated parsers has been significantly improved, adoption of the technology has been hindered by the complexity of compiling and deploying the generated parsers. Through careful analysis of the operations required for parsing and validation, we have devised a set of specialized byte codes, designed for the task of XML parsing and validation. These byte codes are designed to engender the benefits of fine-grained composition of parsing and validation that make existing compiled parsers fast, while being coarse-grained enough to minimize interpreter overhead. This technique of using an interpretive,validating parser balances the need for performance against the requirements of simple tooling and robust scalable infrastructure. Our approach is demonstrated with a specialized schema compiler, used to generate byte codes which in turn drive an interpretive parser. With almost as little tooling and deployment complexity as a traditional interpretive parser, the byte code-driven parser usually demonstrates performance within 20% of the fastest fully compiled solutions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
The Apache Foundation. Xerces. http://xml.apache.org.
2
 
3
K. Chiu and W. Lu. A Compiler-Based Approach to Schema-Specific XML Parsing. In First International Workshop on High Performance XML Processing, May 2004.
 
4
J. Clark. Expat XML parser. http://expat.sourceforge.net/.
 
5
D. C. Fallside and P. Walmsley, editors. XML Schema Part 0: Primer Second Edition. W3C, second edition, Oct 2004. http://www.w3.org/TR/xmlschema-0.
 
6
The GNU Project. Flex. http://www.gnu.org/software/flex/.
7
 
8
J. K. Ousterhout. Tool Command Language. http://www.tcl.tk/
 
9
 
10
F. Reuter and N. Luttenberger. Cardinality constraint automata: A core technology for efficient XML schema-aware parsers. http://www.swarms.de/publications/cca.pdf, 2003.
 
11
Sarvega, Inc. XML Validation Benchmark. http://www.sarvega.com/xml-validation-benchmark.html
 
12
saxproject.org. SAX: Simple API For XML. http://www.saxproject.org/.
 
13
Sun Microsystems, Inc. Java Technology. http://java.sun.com/.
 
14
H. Thomson, D. Beech, M. Maloney, and N. Mendelsohn, editors. XML Schema Part 1: Structures. W3C, second edition, Oct 2004. http://www.w3.org/TR/REC-xmlschema
 
15
R. van Engelen. Constructing Finite State Automata for High-Performance XML Web Services. In International Conference on Internet Computing, 2004.
 
16
L. Wall. Practical Extraction and Report Language. http://www.perl.org/

Collaborative Colleagues:
Morris Matsa: colleagues
Eric Perkins: colleagues
Abraham Heifets: colleagues
Margaret Gaitatzes Kostoulas: colleagues
Daniel Silva: colleagues
Noah Mendelsohn: colleagues
Michelle Leger: colleagues