ACM Home Page
Please provide us with feedback. Feedback
High performance XML parsing using parallel bit stream technology
Full text PdfPdf (257 KB)
Source IBM Centre for Advanced Studies Conference archive
Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds table of contents
Ontario, Canada
SESSION: Compilers table of contents
Article No. 17  
Year of Publication: 2008
Authors
Robert D. Cameron  Simon Fraser University
Kenneth S. Herdy  Simon Fraser University
Dan Lin  Simon Fraser University
Sponsors
: IBM Toronto Software Lab
: IBM Centers for Advanced Studies (CAS)
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 101,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1463788.1463811
What is a DOI?

ABSTRACT

Parabix (parallel bit streams for XML) is an open-source XML parser that employs the SIMD (single-instruction multiple-data) capabilities of modern-day commodity processors to deliver dramatic performance improvements over traditional byte-at-a-time parsing technology. Byte-oriented character data is first transformed to a set of 8 parallel bit streams, each stream comprising one bit per character code unit. Character validation, transcoding and lexical item stream formation are all then carried out in parallel using bitwise logic and shifting operations. Byte-at-a-time scanning loops in the parser are replaced by bit scan loops that can advance by as many as 64 positions with a single instruction.

A performance study comparing Parabix with the open-source Expat and Xerces parsers is carried out using the PAPI toolkit. Total CPU cycle counts, level 2 data cache misses and branch mispredictions are measured and compared for each parser. The performance of Parabix is further studied with a breakdown of the cycle counts across the core components of the parser. Prospects for further performance improvements are also outlined, with a particular emphasis on leveraging the intraregister parallelism of SIMD processing to enable intrachip parallelism on multicore architectures.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Performance Application Programming Interface. http://icl.cs.utk.edu/papi/.
 
2
Xerces C++ Parser. http://xerces.apache.org/xerces-c/.
 
3
Cameron, Robert D. u8u16 -- A High-Speed UTF-8 to UTF-16 Transcoder Using Parallel Bit Streams. Technical Report TR 2007-18, Simon Fraser University, Burnaby, BC, Canada, 2007.
4
 
5
Clark, James. The Expat XML Parser. http://expat.sourceforge.net/.
 
6
DuCharme, Bob. Documents vs. data, schemas vs. schemas. In XML 2004, Washington D.C., 2004.
 
7
Intel Corporation. IA-32 Intel Architecture Optimization Reference Manual, 2005.
8
9
 
10
Perkins, E., Kostoulas, M., Heifets, A., Matsa, M., and Mendelsohn, N. Performance Analysis of XML APIs. In XML 2005, Atlanta, Georgia, November 2005.
 
11
Pettersson, Michael. Linux x86 Performance-Monitoring Counters Driver. http://user.it.uu.se/mikpe/linux/perfctr.
 
12
 
13
 
14
Ross, Kenneth A. Efficient Hash Probes on Modern Processors. In Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, 2007.
 
15
Zhao, Li Laxmi Bhuyan. Performance Evaluation and Acceleration for XML Data Parsing. In 9th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), Austin, Texas, 2006.


Collaborative Colleagues:
Robert D. Cameron: colleagues
Kenneth S. Herdy: colleagues
Dan Lin: colleagues