|
ABSTRACT
Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, the research on compressors for XML data has been conducted. However, some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Thus, the query performance on compressed XML data is degraded.In this paper, we propose XPRESS, an XML compressor which supports direct and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method, called reverse arithmetic encoding, which is intended for encoding label paths of XML data, and applies diverse encoding methods depending on the types of data values. Experimental results with real life data sets show that XPRESS achieves significant improvements on query performance for compressed XML data and reasonable compression ratios. On the average, the query performance of XPRESS is 2.83 times better than that of an existing XML compressor and the compression ratio of XPRESS is 73%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Anonymous. http://www.cs.washington.edu/research/projects/xmltk/www/xmlproperties.html.
|
| |
3
|
S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. XQuery 1.0: An XML Query Language. Working Draft, http://www.w3.org/TR/2002/WD-xquery-20020816, 16 August 2002.
|
| |
4
|
T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0. W3C Recommendation, http://www.w3.org/TR/REC-xml, 1998.
|
 |
5
|
|
| |
6
|
J. Clark and S. DeRose. XML Path Language(XPath) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xpath, November 1999.
|
| |
7
|
R. Cover. The XML Cover Pages. http://www.oasis-open.org/cover/xml.html, 2001.
|
| |
8
|
|
| |
9
|
|
| |
10
|
D. Florescu and D. Kossman. Storing and Querying XML Data using an RDMBS. IEEE Data Engineering Bulletin, 22(3):27--34, September 1999.
|
| |
11
|
|
 |
12
|
|
| |
13
|
E. R. Harold. Long Baseball Examples from The XML Bible. ibiblio, http://www.ibiblio.org/xml/examples/baseball/.
|
| |
14
|
P. G. Howard and J. S. Vitter. Analysis of Arithmetic Coding for Data Compression. In Proceedings of the IEEE Data Compression Conference, pages 3--12, April 1991.
|
| |
15
|
D. A. Huffman. A Method for the Construction of Minimum Redandancy Codes. In Proceedings of the Institute of Radio Engineers 40, pages 1098--1101, September 1952.
|
 |
16
|
|
| |
17
|
C.-W. Park, J.-K. Min, and C.-W. Chung. Structural Function Inlining Technique for Structurally Recursive XML Queries. In Proceedings of 28th International Conference on Very Large Data Bases, pages 83--94, August 2002.
|
| |
18
|
|
| |
19
|
Jayavel Shanmugasundaram , Eugene J. Shekita , Rimon Barr , Michael J. Carey , Bruce G. Lindsay , Hamid Pirahesh , Berthold Reinwald, Efficiently Publishing Relational Data as XML Documents, Proceedings of the 26th International Conference on Very Large Data Bases, p.65-76, September 10-14, 2000
|
| |
20
|
C. E. Shannon. A Mathematical Theory of Communication. Bell Syst. Tech. J., 27:398--403, July 1948.
|
| |
21
|
|
 |
22
|
Igor Tatarinov , Stratis D. Viglas , Kevin Beyer , Jayavel Shanmugasundaram , Eugene Shekita , Chun Zhang, Storing and querying ordered XML using a relational database system, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564715]
|
| |
23
|
|
 |
24
|
|
CITED BY 26
|
|
|
|
|
Yi Chen , George A. Mihaila , Susan B. Davidson , Sriram Padmanabhan, EXPedite: a system for encoded XML processing, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher J. Augeri , Barry E. Mullins , Dursun A. Bulutoglu , Rusty O. Baldwin , Leemon C. Baird, III, An analysis of XML binary formats and compression, Experimental computer science on Experimental computer science, p.6-6, June 13-14, 2007, San Diego
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher J. Augeri , Dursun A. Bulutoglu , Barry E. Mullins , Rusty O. Baldwin , Leemon C. Baird, III, An analysis of XML compression efficiency, Proceedings of the 2007 workshop on Experimental computer science, p.7-es, June 13-14, 2007, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nils Hoeller , Christoph Reinke , Jana Neumann , Sven Groppe , Daniel Boeckmann , Volker Linnemann, Efficient XML usage within wireless sensor networks, Proceedings of the 4th Annual International Conference on Wireless Internet, November 17-19, 2008, Maui, Hawaii
|
|