ACM Home Page
Please provide us with feedback. Feedback
Prefiltering techniques for efficient XML document processing
Full text PdfPdf (443 KB)
Source Document Engineering archive
Proceedings of the 2005 ACM symposium on Document engineering table of contents
Bristol, United Kingdom
SESSION: Document searching, document annotation, and document metadata table of contents
Pages: 149 - 158  
Year of Publication: 2005
ISBN:1-59593-240-2
Authors
Chia-Hsin Huang  National Taiwan University of Science and Technology, Taipei, Taiwan and Academia Sinica, Taipei, Taiwan
Tyng-Ruey Chuang  Academia Sinica, Taipei, Taiwan
Hahn-Ming Lee  National Taiwan University of Science and Technology, Taipei, Taiwan
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 46,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1096601.1096641
What is a DOI?

ABSTRACT

Document Object Model (DOM) and Simple API for XML (SAX) are the two major programming models for XML document processing. Each, however, has its own efficiency limitation. DOM assumes an in-core representation of XML documents which can be problematic for large documents. SAX needs to scan over the document in a linear manner in order to locate the interesting fragments. Previously, we have used tree-to-table mapping and indexing techniques to help answer structural queries to large, or large collections of, XML documents. In this paper, we generalize the previous techniques into a prefiltering framework where repeated access to large XML documents can be efficiently carried out within the existing DOM and SAX models. The prefiltering framework essentially uses a tiny search engine to locate useful fragments in the target XML documents by approximately executing the user's queries. Those fragments are gathered into a candidate-set XML document, and is returned to the user's DOM- or SAX-based applications for further processing. This results in a practical and efficient model of XML processing, especially when the XML documents are large and infrequently updated, but are frequently being queried.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Campillo, T. J. Green, A. Gupta, M. Onizuka, D. Raven, and D. Suciu. XMLTK: An XML toolkit for scalable XML stream processing. In Proc. of PLANX, 2002.
 
2
A. Slominski. Design of a Pull and Push Parser System for Streaming XML. Department of Computer Science, Indiana University, Technical Report TR550. 2001. Available: http://www.extreme.indiana.edu/xgws/papers/xml_push_pull.pdf
 
3
4
 
5
C. L. Chang, Y. H. Chang, T. R. Chuang, S. Ho, and F. T. Lin. Bridging Two Geography Languages: Experience in Mapping SEF to GML. In GML Dev Days: 2nd GML Developers Conference, 2003.
 
6
 
7
 
8
D. Megginson. SAX: A Simple API for XML. Available: http://www.saxproject.org/
 
9
 
10
DOM, World Wide Web Consortium. Document Object Model (DOM), W3C Recommendation.
 
11
J. Ferraiolo, editor, Scalable Vector Graphics (SVG) 1.0 Specification, W3C Recommendation, 2001.
 
12
K. J. Chen, C. C. Luo, Z. M. Gao, M. C. Chang, F. Y. Chen, C. J. Chen, and C. R. Huang. The CKIP Chinese Treebank. In Journees ATALA sur les Corpus annotes pour la syntaxe, Talana, Paris VII, 1999.
 
13
14
 
15
16
 
17
18
 
19
S. Cox, P. Daisey, R. Lake, C. Portele, and A. Whiteside, editors. OpenGIS© Geography Markup Language (GML) Implementation Specification, Version: 3.00, 2003.
20
 
21
XML Fragment Interchange (Candidate Recommendation), World Wide Web Consortium.
 
22
XPath, World Wide Web Consortium. XML Path Language (XPath). W3C Recommendation.
 
23
XPointer, World Wide Web Consortium. XML Pointer Language (XPointer), W3C working Draft.
 
24
XQuery, World Wide Web Consortium. XML Query (XQuery). W3C Recommendation.
 
25
XSLT, World Wide Web Consortium. The Extensible Stylesheet Language Transformations (XSLT). W3C.
 
26


Collaborative Colleagues:
Chia-Hsin Huang: colleagues
Tyng-Ruey Chuang: colleagues
Hahn-Ming Lee: colleagues