ACM Home Page
Please provide us with feedback. Feedback
Approximation algorithms for grammar-based compression
Full text PdfPdf (721 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
San Francisco, California
Pages: 205 - 212  
Year of Publication: 2002
ISBN:0-89871-513-X
Authors
Eric Lehman  MIT Laboratory for Computer Science, Cambridge, MA
Abhi Shelat  MIT Laboratory for Computer Science, Cambridge, MA
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 60,   Citation Count: 4
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Several recently-proposed data compression algorithms are based on the idea of representing a string by a context-free grammar. Most of these algorithms are known to be asymptotically optimal with respect to a stationary ergodic source and to achieve a low redundancy rate. However, such results do not reveal how effectively these algorithms exploit the grammar-model itself; that is, are the compressed strings produced as small as possible? We address this issue by analyzing the approximation ratio of several algorithms, that is, the maximum ratio between the size of the generated grammar and the smallest possible grammar over all inputs. On the negative side, we show that every polynomial-time grammar-compression algorithm has approximation ratio at least 8569/8568 unless P = NP. Moreover, achieving an approximation ratio of o(log n/log log n) would require progress on an algebraic problem in a well-studied area. We then upper and lower bound approximation ratios for the following four previously-proposed grammar-based compression algorithms: SEQUENTIAL, BISECTION, GREEDY, and LZ78, each of which employs a distinct approach to compression. These results seem to indicate that there is much room to improve grammar-based compression algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
P. Berman and M. Karpinski. On Some Tighter Inapproximability Results, Further Improvements. ECCC 1998.
 
3
D. Bleichenbacher. Efficiency and Security of Cryptosystems Based on Number Theory. PhD thesis, Swiss Federal Institute of Technology, 1996.
 
4
 
5
P. Downey, B. Leong, and R. Sethi. Computing Sequences with Addition Chains. SIAM Journal on Computing, 10(3):638-646, August 1981.
 
6
J. C. Kieffer and E. Yang. Grammar-Based Codes: a New Class of Universal Lossless Source Codes. IEEE Transactions on Information Theory, vol. 46 (2000), pp. 737-754.
 
7
J. C. Kieffer, E. Yang, G. J. Nelson, P. Cosman. Universal Lossless Compression via Multilevel Pattern Matching. IEEE Transactions on Information Theory, vol. 46 (2000), pp. 1227-1245.
 
8
D. Knuth. Seminumerical Algorithms. Addison-Wesley, 1981, pp. 441-462.
 
9
C. Nevill-Manning. Inferring Sequential Structure. PhD thesis, University of Waikato, 1996.
 
10
 
11
T. A. Welch. A Technique for High Performance Data Compression. IEEE Computer, vol. 17 (June 1984), pp. 8-19.
 
12
E. Yang and J. C. Kieffer. Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform. IEEE Transactions on Information Theory, vol. 46 (2000), pp. 755-777.
 
13
J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, vol. 23 (1977), pp. 337-343.
 
14
J. Ziv and A. Lempel. Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory, vol. 24 (1978), pp. 530-536.

Collaborative Colleagues:
Eric Lehman: colleagues
Abhi Shelat: colleagues