ACM Home Page
Please provide us with feedback. Feedback
The relationship between greedy parsing and symbolwise text compression
Full text PdfPdf (1.03 MB)
Source Journal of the ACM (JACM) archive
Volume 41 ,  Issue 4  (July 1994) table of contents
Pages: 708 - 724  
Year of Publication: 1994
ISSN:0004-5411
Authors
Timothy C. Bell  Univ. of Canterbury, Christchurch, New Zealand
Ian H. Witten  Univ. of Waikato, Hamilton, New Zealand
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 35,   Citation Count: 0
Additional Information:

abstract   references   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/179812.179892
What is a DOI?

ABSTRACT

Text compression methods can be divided into two classes: symbolwise and parsing. Symbolwise methods assign codes to individual symbols, while parsing methods assign codes to groups of consecutive symbols (phrases). The set of phrases available to a parsing method is referred to as a dictionary. The vast majority of parsing methods in the literature use greedy parsing (including nearly all variations of the popular Ziv-Lempel methods). When greedy parsing is used, the coder processes a string from left to right, at each step encoding as many symbols as possible with a phrase from the dictionary. This parsing strategy is not optimal, but an optimal method cannot guarantee a bounded coding delay. An important problem in compression research has been to establish the relationship between symbolwise methods and parsing methods. This paper extends prior work that shows that there are symbolwise methods that simulate a subset of greedy parsing methods. We provide a more general algorithm that takes any nonadaptive greedy parsing method and constructs a symbolwise method that achieves exactly the same compression. Combined with the existence of symbolwise equivalents for two of the most significant adaptive parsing methods, this result gives added weight to the idea that research aimed at increasing compression should concentrate on symbolwise methods, while parsing methods should be chosen for speed or temporary storage considerations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
~BELL, T. C. 1986. Better OPM/L text compression. IEEE Trans. Comm. COM-34, 12, pp. ~1176 1182.
 
2
~BELL, T. C. 1987. A unifying theory and improvements for existing approaches to text compres-sion. Ph.D. dissertation. Department of Computer Science, Univ. Canterbury, New Zealand.
 
3
 
4
5
 
6
~BOOKSTEIN, A., AND FOUTY, G. 1976. A mathematical model for estimating the effectiveness of ~bigram coding, Inf. Proc. Manag. 12, 111 116.
 
7
~BRENT, R. P. 1987. A linear algorithm for data compression. Australian Comput. J. 19, 2, 64-68.
 
8
~CLEARY, J. G., AND WITTEN, 1. H. 1984. Data compression using adaptive coding and partial ~string matching. IEEE Trans. Commun. COM-32, 4 (July), 396-402.
 
9
 
10
~FENW~CK, P. M. 1993. Ziv Lempel encoding with multi-bit flags. In Data Compression Confer- ~ence (DCC 93) (Snowbird, Ut.). IEEE Computer Society Press, Los Alamitos, Calif., pp. ~138-147.
11
12
 
13
~GUTMANN, P. C. 1992. Practical dictionary/arithmetic data compression systems. M.Sc. disserta- ~tion, Univ. Auckland, Auckland, New Zealand.
 
14
~HORSPOOL, R. N. 1991. Improving LZW. In Data Compresszon Conference (DCC 91) (Snowbird, ~Ut.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 332-341.
 
15
~HUFFMAN, D. A. 1952. A method for the construction of minimum redundancy codes. Proc. Inst. ~adio Eng. 40, 1098-1101.
 
16
 
17
~LANGDON, G. 1983. A note on the Ziv-Lempel model for compressing individual sequences. ~IEEE Trans. Inf. Theory, IT-29, 284-287.
 
18
~MAYNE, A., AND JAMES, E. g. 1975. Information compression by factorising common strings. ~Cornput. J. 18, 2, 157-160.
 
19
~MILLER, g. S., AND WEGMAN, M. N. 1984. Variations on a theme by Ziv and Lempel. In ~Combinatorial Algorithms on Words (A. Apostolico and Z. Galil, eds.). NATO ASI series, vol. ~F12. Springer-Verlag, Berlin, pp. 131-140.
 
20
~PERL, Y., MARAM, AND KADAKUNTLA, N. 1991. The cascading of the LZW compression algo- ~rithm with arithmetic coding. In Data Compression Conference (DCC 91) (Snowbird, Ut.). IEEE ~Computer Society Press, Los Alamitos, Calif., pp. 277 286.
 
21
PIKE, J. 1981. Text compression using a 4-bit coding system. Comput. J. 24, 4, 324 330.
 
22
RISSANEN, J., AND LANGDON, G. 1981. Universal modeling and coding. IEEE Trans. hlf. Theory ~IT-27, 12-23.
23
24
 
25
~RUTH, S. S., AND KREUTZER, P. J. 1972. Data compression for large business files. Datamation ~(Sept)., 62-66.
 
26
~SCHIEBER, W. D., AND THOMAS, G. W. 1971. An algorithm for compaction of alphanumeric data. ~J. Library Automation 4, 198-206.
 
27
~SCnUEGRAF, E. J., AND HEAPS, H. S. 1973. Selection of equifrequent word fragments for ~information retrieval. Inf. Stor. Ret., 9, 697-711.
 
28
~SCHUEGRAF, E. J., AND HEAPS, H. S. 1974. A comparison of algorithms for database compression ~by use of fragments as language elements. Inf. Stor. Ret. lO, 309-319.
 
29
~SEVERANCE, D. G. 1983. A practitioner's guide to database compression. Inf. Syst. 8, 1, 51-62.
 
30
~SHANNON, C. E. 1948. A mathematical theory of communication. Bell Syst. Tech. J, 27, 379-423, ~623-656.
 
31
~SNYDERMAN, M., AND HUNT, B. 1970. The myriad virtues of text compaction. Datamation I, ~(Dec.), 36-40.
32
 
33
~SVANKS, M. I. 1975. Optimizing the storage of alphanumeric data, Canad. Datasystems (May), ~38-40.
 
34
~TI:SCHER, P. 1987. A modified L~mpcl-giv Welch data compression scheme. Austr. ('omput. ~Sci. Cornmun. 9, 1, 262-272.
35
 
36
~WELCH, T. A. 1984. A technique for high performance data compression. Compater 17, 6, 8-19.
 
37
~WHITE, H. E. 1967. Printed Enghsh compression by dmtlonary encoding Proc. IEEE 55, 3, ~390 396.
 
38
~WILLIAMS, R. N. 1991. An extremely fast Z~v-Lempel data compression algorithm In Data ~Compression Co~zference (DCC 91) {Snowbird, Ut.). IEEE Computer Society Press, Los Alami- ~tos, Cahf., pp. 362-369.
39
 
40
~WOLFF, J. G. 1978. Recoding of natural language for economy of transmission or storage. ~Computer.l. 21, 1, 42-44.
 
41
~ZlV. J., AND LEMPEL, A. 1977. A universal algorithm for sequentml data compression. IEEE ~Trans. In{. Theory 1T-23, 3, 337-343.
 
42
~ZIV, J., AND LEMPEL, A. 1978 Compression of individual sequences via variable-rate coding. ~IEEE Trans. hlf. Theory IT-24, 5,530-536.


REVIEW

"Mihaela Carstea : Reviewer"

Memory was, is, and will be a critical resource for a computer. To deal with this fact, at least to a certain extent, many compression methods have been introduced. An important problem in compression research is establishing the r  more...

Collaborative Colleagues:
Timothy C. Bell: colleagues
Ian H. Witten: colleagues