|
ABSTRACT
With the emergence of software delivery platforms, code compression has become an important system component that strongly affects performance. This article presents PPMexe, a compression mechanism for program binaries that analyzes their syntax and semantics to achieve superior compression ratios. We use the generic paradigm of prediction by partial matching (PPM) as the foundation of our compression codec. PPMexe combines PPM with two preprocessing steps: (i) instruction rescheduling to improve prediction rates and (ii) heuristic partitioning of a program binary into streams with high autocorrelation. We improve the traditional PPM algorithm by (iii) using an additional alphabet of frequent variable-length supersymbols extracted from the input stream of fixed-length symbols. In addition, PPMexe features (iv) a low-overhead mechanism that enables decompression starting from an arbitrary instruction of the executable, a property pivotal for runtime software delivery. We implemented PPMexe for x86 binaries and tested it on several large applications. Binaries compressed using PPMexe were 18--24% smaller than files created using off-the-shelf PPMD, one of the best available compressors
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Baker, B. S. and Manber, U. 1998. Deducing similarities in Java sources from bytecodes. In Proceedings of the USENIX Annual Technical Conference. 179--190.
|
| |
3
|
Bunton, S. 1997. Semantically motivated improvements for PPM variants. Comput. J. 40, 2/3, 76--93.
|
| |
4
|
Burrows, M. and Wheeler, D. 1994. A block-sorting lossless data compression algorithm. Tech. Rep., Digital Equipment Corporation.
|
| |
5
|
Martin Burtscher , Ilya Ganusov , Sandra J. Jackson , Jian Ke , Paruj Ratanaworabhan , Nana B. Sam, The VPC Trace-Compression Algorithms, IEEE Transactions on Computers, v.54 n.11, p.1329-1344, November 2005
[doi> 10.1109/TC.2005.186]
|
 |
6
|
|
 |
7
|
|
| |
8
|
Cleary, J. and Witten, I. 1984. Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 32, 4, 396--402.
|
 |
9
|
|
 |
10
|
Jens Ernst , William Evans , Christopher W. Fraser , Todd A. Proebsting , Steven Lucco, Code compression, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.358-365, June 16-18, 1997, Las Vegas, Nevada, United States
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
Gilchrist, J. 2000. The archive compression test. http://compression.ca.
|
| |
15
|
|
| |
16
|
|
 |
17
|
Inki Hong , Darko Kirovski , Miodrag Potkonjak, Potential-driven statistical ordering of transformations, Proceedings of the 34th annual conference on Design automation, p.347-352, June 09-13, 1997, Anaheim, California, United States
[doi> 10.1145/266021.266161]
|
| |
18
|
|
| |
19
|
|
| |
20
|
Howard, P. and Vitter, J. 1993. Design and analysis of fast text compression based on quasi-arithmetic coding. In Proceedings of the Data Compression Conference. 98--107.
|
| |
21
|
Huffman, D. 1952. A method for construction of minimum redundancy codes. Proc. IEEE 40, 1098--1101.
|
| |
22
|
Intel Corp. 1999a. http://www.intel.com/design/pentiumiii.
|
| |
23
|
Intel Corp. 1999b. Intel architecture software developer's manual, vol. 2: Instruction set reference manual. http://developer.intel.com/design/processor/.
|
| |
24
|
Intel Corp. 2000. http://www.intel.com/design/pentium4.
|
| |
25
|
Darko Kirovski , Johnson Kin , William H. Mangione-Smith, Procedure based program compression, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.204-213, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
26
|
Kolmogorov, A. N. 1965. Three approaches to the quantitative definition of information. Problems Inf. Transmission 1, 1, 1--7.
|
 |
27
|
|
 |
28
|
Jeremy Lau , Stefan Schoenmackers , Timothy Sherwood , Brad Calder, Reducing code size with echo instructions, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951724]
|
| |
29
|
|
 |
30
|
|
| |
31
|
Stan Liao , Srinivas Devadas , Kurt Keutzer , Steve Tjiang, Instruction selection using binate covering for code size optimization, Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design, p.393-399, November 05-09, 1995, San Jose, California, United States
|
 |
32
|
|
| |
33
|
Moffat, A. 1990. Implementing the PPM data compression scheme. IEEE Trans. Commun. 38, 11, 1917--1921.
|
| |
34
|
Mohney, D. 2003. It's all about the last mile. http://www.theinquirer.net.
|
 |
35
|
|
 |
36
|
|
 |
37
|
|
 |
38
|
|
| |
39
|
Rissanen, J. 1978. Modeling by shortest data description. Automatica 14, 465--471.
|
| |
40
|
Rissanen, J. and Mohiuddin, K. M. 1989. A multiplication-free multialphabet arithmetic code. IEEE Trans. Commun. 37, 3, 129--146.
|
 |
41
|
Theodore H. Romer , Dennis Lee , Geoffrey M. Voelker , Alec Wolman , Wayne A. Wong , Jean-Loup Baer , Brian N. Bershad , Henry M. Levy, The structure and performance of interpreters, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.150-159, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
42
|
Shannon, C. 1951. Prediction and entropy of printed English. Bell Syst. Tech. J. 50--64.
|
| |
43
|
Srivastava, A. and Vo, H. 2001. Vulcan: Binary transformation in a distributed environment. Tech. Rep. MSR-TR-2001-50. Microsoft Research.
|
| |
44
|
|
| |
45
|
|
| |
46
|
Weaver, C. 2000. SPEC2000 binaries. http://www.simplescalar.org.
|
| |
47
|
|
 |
48
|
|
 |
49
|
|
| |
50
|
Ziv, J. and Lempel, A. 1978. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. IT-24, 530--536.
|
|