|
ABSTRACT
We consider the problem of finding the smallest context-free grammar that generates exactly one given string of length n. The size of this grammar is of theoretical interest as an efficiently computable variant of Kolmogorov complexity. The problem is of practical importance in areas such as data compression and pattern extraction.The smallest grammar is known to be hard to approximate to within a constant factor, and an o(logn/log logn) approximation would require progress on a long-standing algebraic problem [10]. Previously, the best proved approximation ratio was O(n1/2) for the Bisection algorithm [8]. Our main result is an exponential improvement of this ratio; we give an O(log (n/g*)) approximation algorithm, where g* is the size of the smallest grammar.We then consider other computable variants of Kolomogorov complexity. In particular we give an O(log2 n) approximation for the smallest non-deterministic finite automaton with advice that produces a given string. We also apply our techniques to "advice-grammars" and "edit-grammars", two other natural models of string complexity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
V. Chvátal. A greedy heuristic for the set-covering problem. (MATH)ematics of Operations Research, 4(3):233--235, 1979.
|
| |
3
|
|
| |
4
|
|
| |
5
|
M. Farach-Colton. Personal communication.
|
| |
6
|
|
| |
7
|
J. C. Kieffer and E. hui Yang. Grammar-based codes: A new class of universal lossless source codes. IEEE Transactions on Information Theory, 46(3):737--754, 2000.
|
| |
8
|
J. C. Kieffer, E. hui Yang, G. J. Nelson, and P. Cosman. Universal lossless compression via multilevel pattern matching. IEEE Transactions on Information Theory, 46(4):1227--1245, 2000.
|
| |
9
|
|
| |
10
|
|
| |
11
|
C. Nevill-Manning. Inferring Sequential Structure. PhD thesis, University of Waikato, 1996.
|
| |
12
|
C. G. Nevill-Manning and I. H. Witten. Compression and explanation using hierarchical grammars. The Computer Journal, 40(2/3):103--116, 1997.
|
| |
13
|
|
| |
14
|
E. h. Yang and J. C. Kieffer. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform--part one: Without context models. IEEE Transactions on Information Theory, 46(3):755--777, 2000.
|
| |
15
|
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337--343, 1977.
|
| |
16
|
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, IT-24:530--536, 1978.
|
|