ACM Home Page
Please provide us with feedback. Feedback
PPMexe: Program compression
Full text PdfPdf (737 KB)
Source ACM Transactions on Programming Languages and Systems (TOPLAS) archive
Volume 29 ,  Issue 1  (January 2007) table of contents
Article No. 3  
Year of Publication: 2007
ISSN:0164-0925
Authors
Milenko Drinić  Microsoft Research, Redmond, WA
Darko Kirovski  Microsoft Research, Redmond, WA
Hoi Vo  Microsoft Research, Redmond, WA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 52,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1180475.1180478
What is a DOI?

ABSTRACT

With the emergence of software delivery platforms, code compression has become an important system component that strongly affects performance. This article presents PPMexe, a compression mechanism for program binaries that analyzes their syntax and semantics to achieve superior compression ratios. We use the generic paradigm of prediction by partial matching (PPM) as the foundation of our compression codec. PPMexe combines PPM with two preprocessing steps: (i) instruction rescheduling to improve prediction rates and (ii) heuristic partitioning of a program binary into streams with high autocorrelation. We improve the traditional PPM algorithm by (iii) using an additional alphabet of frequent variable-length supersymbols extracted from the input stream of fixed-length symbols. In addition, PPMexe features (iv) a low-overhead mechanism that enables decompression starting from an arbitrary instruction of the executable, a property pivotal for runtime software delivery. We implemented PPMexe for x86 binaries and tested it on several large applications. Binaries compressed using PPMexe were 18--24% smaller than files created using off-the-shelf PPMD, one of the best available compressors


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Baker, B. S. and Manber, U. 1998. Deducing similarities in Java sources from bytecodes. In Proceedings of the USENIX Annual Technical Conference. 179--190.
 
3
Bunton, S. 1997. Semantically motivated improvements for PPM variants. Comput. J. 40, 2/3, 76--93.
 
4
Burrows, M. and Wheeler, D. 1994. A block-sorting lossless data compression algorithm. Tech. Rep., Digital Equipment Corporation.
 
5
6
7
 
8
Cleary, J. and Witten, I. 1984. Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 32, 4, 396--402.
9
10
11
12
13
 
14
Gilchrist, J. 2000. The archive compression test. http://compression.ca.
 
15
 
16
17
 
18
 
19
 
20
Howard, P. and Vitter, J. 1993. Design and analysis of fast text compression based on quasi-arithmetic coding. In Proceedings of the Data Compression Conference. 98--107.
 
21
Huffman, D. 1952. A method for construction of minimum redundancy codes. Proc. IEEE 40, 1098--1101.
 
22
Intel Corp. 1999a. http://www.intel.com/design/pentiumiii.
 
23
Intel Corp. 1999b. Intel architecture software developer's manual, vol. 2: Instruction set reference manual. http://developer.intel.com/design/processor/.
 
24
Intel Corp. 2000. http://www.intel.com/design/pentium4.
 
25
 
26
Kolmogorov, A. N. 1965. Three approaches to the quantitative definition of information. Problems Inf. Transmission 1, 1, 1--7.
27
28
 
29
30
 
31
32
 
33
Moffat, A. 1990. Implementing the PPM data compression scheme. IEEE Trans. Commun. 38, 11, 1917--1921.
 
34
Mohney, D. 2003. It's all about the last mile. http://www.theinquirer.net.
35
36
37
38
 
39
Rissanen, J. 1978. Modeling by shortest data description. Automatica 14, 465--471.
 
40
Rissanen, J. and Mohiuddin, K. M. 1989. A multiplication-free multialphabet arithmetic code. IEEE Trans. Commun. 37, 3, 129--146.
41
 
42
Shannon, C. 1951. Prediction and entropy of printed English. Bell Syst. Tech. J. 50--64.
 
43
Srivastava, A. and Vo, H. 2001. Vulcan: Binary transformation in a distributed environment. Tech. Rep. MSR-TR-2001-50. Microsoft Research.
 
44
 
45
 
46
Weaver, C. 2000. SPEC2000 binaries. http://www.simplescalar.org.
 
47
48
49
 
50
Ziv, J. and Lempel, A. 1978. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. IT-24, 530--536.

Collaborative Colleagues:
Milenko Drinić: colleagues
Darko Kirovski: colleagues
Hoi Vo: colleagues