|
ABSTRACT
Code compression coupled with dynamic decompression is an important technique for both embedded and general-purpose microprocessors. Postfetch decompression, in which decompression is performed after the compressed instructions have been fetched, allows the instruction cache to store compressed code but requires a highly efficient decompression implementation. We propose implementing postfetch decompression using a new hardware facility called dynamic instruction stream editing (DISE). DISE provides a programmable decoder---similar in structure to those in many IA-32 processors---that is used to add functionality to an application by injecting custom code snippets into its fetched instruction stream. We present a DISE-based implementation of postfetch decompression and show that it naturally supports customized program-specific decompression dictionaries, enables parameterized decompression allowing similar-but-not-identical instruction sequences to share dictionary entries, and uses no decompression-specific hardware. We present extensive experimental results showing the virtue of this approach and evaluating the factors that impact its efficacy. We also present implementation-neutral results that give insight into the characteristics of any postfetch decompression technique. Our experiments not only demonstrate significant reduction in code size (up to 35&percent;) but also significant improvements in performance (up to 20&percent;) and energy (up to 10&percent;).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Advanced RISC Machines Ltd. 1995. An Introduction to Thumb. Advanced RISC Machines Ltd, Austin, TX.
|
| |
2
|
|
| |
3
|
Guido Araujo , Paulo Centoducatte , Mario Cartes , Ricardo Pannain, Code compression based on operand factorization, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.194-201, November 1998, Dallas, Texas, United States
|
 |
4
|
|
| |
5
|
Burger, D. and Austin, T. M. 1997. The SimpleScalar Tool Set, Version 2.0. Tech. Rep. 1342, University of Wisconsin--Madison Computer Sciences Department.
|
 |
6
|
|
| |
7
|
Corliss, M. L., Lewis, E. C., and Roth, A. 2002. DISE: Dynamic Instruction Stream Editing. Tech. Rep. MS-CIS-02-24, University of Pennsylvania. July.
|
 |
8
|
|
 |
9
|
Marc L. Corliss , E. Christopher Lewis , Amir Roth, A DISE implementation of dynamic code decompression, Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems, June 11-13, 2003, San Diego, California, USA
|
| |
10
|
Cormie, D. 2002. The ARM11 microarchitecture. ARM Ltd. White Paper.
|
 |
11
|
|
 |
12
|
|
| |
13
|
Diefendorf, K. 1998. K7 challenges Intel. Microprocess. Rep. 12, 14 (Nov.).
|
 |
14
|
Jens Ernst , William Evans , Christopher W. Fraser , Todd A. Proebsting , Steven Lucco, Code compression, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.358-365, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
15
|
Glaskowsky, P. 2000. Pentium 4 (partially) previewed. Microprocess. Rep. 14, 8 (Aug.).
|
| |
16
|
Gwenapp, L. 1997. P6 microcode can be patched. Microprocess. Rep. 11, 12 (Sep.).
|
| |
17
|
|
| |
18
|
Darko Kirovski , Johnson Kin , William H. Mangione-Smith, Procedure based program compression, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.204-213, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
19
|
Kissell, K. 1997. MIPS16: High-Density MIPS for the Embedded Market. Silicon Graphics MIPS Group, Mt. View, CA.
|
| |
20
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
21
|
Charles Lefurgy , Peter Bird , I-Cheng Chen , Trevor Mudge, Improving code density using compression techniques, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.194-203, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
22
|
Lefurgy, C., Piccininni, E., and Mudge, T. 2000. Reducing code size with run-time decompression. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture. 218--227.
|
 |
23
|
Haris Lekatsas , Jörg Henkel , Wayne Wolf, Code compression for low power embedded system design, Proceedings of the 37th conference on Design automation, p.294-299, June 05-09, 2000, Los Angeles, California, United States
[doi> 10.1145/337292.337423]
|
 |
24
|
|
| |
25
|
Nam, S.-J., Park, I.-C., and Kyung, C.-M. 1999. Improving dictionary-based code compression in VLIW architectures. IEICE Trans. Fundam. E82-A, 11 (Nov.), 2318--2324.
|
| |
26
|
Phelan, R. 2003. Improving ARM Code Density and Performance. Tech. Rep., Advanced RISC Machines Ltd, Austin, TX.
|
 |
27
|
|
| |
28
|
Wilton, S. and Jouppi, N. 1994. An Enhanced Access and Cycle Time Model for On-Chip Caches. Tech. Rep., DEC Western Research Laboratory, Palo Alto, CA.
|
 |
29
|
|
| |
30
|
|
|