| Approximation algorithms for grammar-based compression |
| Full text |
Pdf
(721 KB)
|
| Source
|
Symposium on Discrete Algorithms
archive
Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
table of contents
San Francisco, California
Pages: 205 - 212
Year of Publication: 2002
ISBN:0-89871-513-X
|
|
Authors
|
|
Eric Lehman
|
MIT Laboratory for Computer Science, Cambridge, MA
|
|
Abhi Shelat
|
MIT Laboratory for Computer Science, Cambridge, MA
|
|
| Sponsor |
|
| Publisher |
Society for Industrial and Applied Mathematics
Philadelphia, PA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 64, Citation Count: 4
|
|
|
ABSTRACT
Several recently-proposed data compression algorithms are based on the idea of representing a string by a context-free grammar. Most of these algorithms are known to be asymptotically optimal with respect to a stationary ergodic source and to achieve a low redundancy rate. However, such results do not reveal how effectively these algorithms exploit the grammar-model itself; that is, are the compressed strings produced as small as possible? We address this issue by analyzing the approximation ratio of several algorithms, that is, the maximum ratio between the size of the generated grammar and the smallest possible grammar over all inputs. On the negative side, we show that every polynomial-time grammar-compression algorithm has approximation ratio at least 8569/8568 unless P = NP. Moreover, achieving an approximation ratio of o(log n/log log n) would require progress on an algebraic problem in a well-studied area. We then upper and lower bound approximation ratios for the following four previously-proposed grammar-based compression algorithms: SEQUENTIAL, BISECTION, GREEDY, and LZ78, each of which employs a distinct approach to compression. These results seem to indicate that there is much room to improve grammar-based compression algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
P. Berman and M. Karpinski. On Some Tighter Inapproximability Results, Further Improvements. ECCC 1998.
|
| |
3
|
D. Bleichenbacher. Efficiency and Security of Cryptosystems Based on Number Theory. PhD thesis, Swiss Federal Institute of Technology, 1996.
|
| |
4
|
|
| |
5
|
P. Downey, B. Leong, and R. Sethi. Computing Sequences with Addition Chains. SIAM Journal on Computing, 10(3):638-646, August 1981.
|
| |
6
|
J. C. Kieffer and E. Yang. Grammar-Based Codes: a New Class of Universal Lossless Source Codes. IEEE Transactions on Information Theory, vol. 46 (2000), pp. 737-754.
|
| |
7
|
J. C. Kieffer, E. Yang, G. J. Nelson, P. Cosman. Universal Lossless Compression via Multilevel Pattern Matching. IEEE Transactions on Information Theory, vol. 46 (2000), pp. 1227-1245.
|
| |
8
|
D. Knuth. Seminumerical Algorithms. Addison-Wesley, 1981, pp. 441-462.
|
| |
9
|
C. Nevill-Manning. Inferring Sequential Structure. PhD thesis, University of Waikato, 1996.
|
| |
10
|
|
| |
11
|
T. A. Welch. A Technique for High Performance Data Compression. IEEE Computer, vol. 17 (June 1984), pp. 8-19.
|
| |
12
|
E. Yang and J. C. Kieffer. Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform. IEEE Transactions on Information Theory, vol. 46 (2000), pp. 755-777.
|
| |
13
|
J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, vol. 23 (1977), pp. 337-343.
|
| |
14
|
J. Ziv and A. Lempel. Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory, vol. 24 (1978), pp. 530-536.
|
CITED BY 4
|
|
Moses Charikar , Eric Lehman , Ding Liu , Rina Panigrahy , Manoj Prabhakaran , April Rasala , Amit Sahai , abhi shelat, Approximating the smallest grammar: Kolmogorov complexity in natural models, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
Joachim Viide , Aki Helin , Marko Laakso , Pekka Pietikäinen , Mika Seppänen , Kimmo Halunen , Rauli Puuperä , Juha Röning, Experiences with model inference assisted fuzzing, Proceedings of the 2nd conference on USENIX Workshop on offensive technologies, p.1-6, July 28, 2008, San Jose, CA
|
|