ACM Home Page
Please provide us with feedback. Feedback
Approximating the smallest grammar: Kolmogorov complexity in natural models
Full text PdfPdf (270 KB)
Source Annual ACM Symposium on Theory of Computing archive
Proceedings of the thiry-fourth annual ACM symposium on Theory of computing table of contents
Montreal, Quebec, Canada
SESSION: Session 12B table of contents
Pages: 792 - 801  
Year of Publication: 2002
ISBN:1-58113-495-9
Authors
Moses Charikar  Princeton University
Eric Lehman  MIT
Ding Liu  Princeton University
Rina Panigrahy  Cisco Systems
Manoj Prabhakaran  Princeton University
April Rasala  MIT
Amit Sahai  Princeton University
abhi shelat  MIT
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 49,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/509907.510021
What is a DOI?

ABSTRACT

We consider the problem of finding the smallest context-free grammar that generates exactly one given string of length n. The size of this grammar is of theoretical interest as an efficiently computable variant of Kolmogorov complexity. The problem is of practical importance in areas such as data compression and pattern extraction.The smallest grammar is known to be hard to approximate to within a constant factor, and an o(logn/log logn) approximation would require progress on a long-standing algebraic problem [10]. Previously, the best proved approximation ratio was O(n1/2) for the Bisection algorithm [8]. Our main result is an exponential improvement of this ratio; we give an O(log (n/g*)) approximation algorithm, where g* is the size of the smallest grammar.We then consider other computable variants of Kolomogorov complexity. In particular we give an O(log2 n) approximation for the smallest non-deterministic finite automaton with advice that produces a given string. We also apply our techniques to "advice-grammars" and "edit-grammars", two other natural models of string complexity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
V. Chvátal. A greedy heuristic for the set-covering problem. (MATH)ematics of Operations Research, 4(3):233--235, 1979.
 
3
 
4
 
5
M. Farach-Colton. Personal communication.
 
6
 
7
J. C. Kieffer and E. hui Yang. Grammar-based codes: A new class of universal lossless source codes. IEEE Transactions on Information Theory, 46(3):737--754, 2000.
 
8
J. C. Kieffer, E. hui Yang, G. J. Nelson, and P. Cosman. Universal lossless compression via multilevel pattern matching. IEEE Transactions on Information Theory, 46(4):1227--1245, 2000.
 
9
 
10
 
11
C. Nevill-Manning. Inferring Sequential Structure. PhD thesis, University of Waikato, 1996.
 
12
C. G. Nevill-Manning and I. H. Witten. Compression and explanation using hierarchical grammars. The Computer Journal, 40(2/3):103--116, 1997.
 
13
 
14
E. h. Yang and J. C. Kieffer. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform--part one: Without context models. IEEE Transactions on Information Theory, 46(3):755--777, 2000.
 
15
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337--343, 1977.
 
16
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, IT-24:530--536, 1978.


Collaborative Colleagues:
Moses Charikar: colleagues
Eric Lehman: colleagues
Ding Liu: colleagues
Rina Panigrahy: colleagues
Manoj Prabhakaran: colleagues
April Rasala: colleagues
Amit Sahai: colleagues
abhi shelat: colleagues