|
ABSTRACT
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Belanger and K. Church. Data flows with examples from telecommunications. In Proceedings of 1999 Workshop on Databases in Telecommunication, Edinburgh, UK, September 1999.
|
| |
2
|
|
| |
3
|
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation, May 1994.
|
| |
4
|
Clark and S. DeRose. XML path language (XPath), version 1.0. W3C Working Draft, August 1999. Available as http ://www. w3. org/TR/xpath.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
D. G. Higgins, R. Fuchs, P. J. Stoehr, and G. N. Cameron. The EMBL data library. Nucleic Acids Research, 20:2071- 2074, 1992.
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
H. Liefke and D. Suciu. XMill: An efficient compressor for XML data. Technical Report MS-CIS-98-06, Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, October 1999.
|
| |
13
|
|
| |
14
|
S. Nestorov, S. Abiteboul, and R. Motwani. Inferring structure in semistructured data. In Proceedings of the Workshop on Management of Semi-structured Data, 1997. Available from http ://www. research, att. com/~ suc iu/workshop-papers, html.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
H.S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML schema part 1: Structures. 1/113C Working Draft, September 1999. Available as http://www, w3. org/TR/xmls chema-I.
|
| |
20
|
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337-343, 1977.
|
CITED BY 72
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R. J. Bayardo , D. Gruhl , V. Josifovski , J. Myllymaki, An evaluation of binary xml encoding optimizations for fast stream based xml processing, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
|
|
|
|
|
|
Yi Chen , George A. Mihaila , Susan B. Davidson , Sriram Padmanabhan, EXPedite: a system for encoded XML processing, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
Robert Grimm , Janet Davis , Eric Lemar , Adam Macbeth , Steven Swanson , Thomas Anderson , Brian Bershad , Gaetano Borriello , Steven Gribble , David Wetherall, System support for pervasive applications, ACM Transactions on Computer Systems (TOCS), v.22 n.4, p.421-486, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jaakko Kangasharju , Sasu Tarkoma, Benefits of alternate XML serialization formats in scientific computing, Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches, p.23-30, June 25-25, 2007, Monterey, California, USA
|
|
|
|
|
|
|
|
|
Andrei Arion , Angela Bonifati , Gianni Costa , Sandra D'Aguanno , Ioana Manolescu , Andrea Pugliese, XQueC: pushing queries to compressed XML data, Proceedings of the 29th international conference on Very large data bases, p.1065-1068, September 09-12, 2003, Berlin, Germany
|
|
|
|
|
|
Christopher J. Augeri , Barry E. Mullins , Dursun A. Bulutoglu , Rusty O. Baldwin , Leemon C. Baird, III, An analysis of XML binary formats and compression, Experimental computer science on Experimental computer science, p.6-6, June 13-14, 2007, San Diego
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher J. Augeri , Dursun A. Bulutoglu , Barry E. Mullins , Rusty O. Baldwin , Leemon C. Baird, III, An analysis of XML compression efficiency, Proceedings of the 2007 workshop on Experimental computer science, p.7-es, June 13-14, 2007, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gabriel Wainer , Qi Liu , Julien Chazal , Loïc Quinet , Mamadou K. Traoré, Performance analysis of web-based distributed simulation in DCD++: a case study across the Atlantic Ocean, Proceedings of the 2008 Spring simulation multiconference, April 14-17, 2008, Ottawa, Canada
|
|
|
Tharaka Devadithya , Zongde Liu , Nayef Abu-Ghazaleh , Wei Lu , Kenneth Chiu , Stephane Ethier, BXSA for fast processing of scientific data, Proceedings of the 2007 spring simulation multiconference, March 25-29, 2007, Norfolk, Virginia
|
|
|
|
|
|
|
|
|
|
|
|
Nils Hoeller , Christoph Reinke , Jana Neumann , Sven Groppe , Daniel Boeckmann , Volker Linnemann, Efficient XML usage within wireless sensor networks, Proceedings of the 4th Annual International Conference on Wireless Internet, November 17-19, 2008, Maui, Hawaii
|
|
|
|
|