|
ABSTRACT
Microcoded customized IPs offer superior performance and direct programmability of micro-architectural structures compared to instruction-based processors, yet at the cost of drastically enlarged code sizes. Code compression can deliver size reductions but necessitates attention to performance issues, so that the performance benefits of microcoded IPs are not squandered in the process. To attain this goal, we propose in this paper a fast code compression technique through exploiting the fact that the microcodes contain a sizable amount of unspecified bits. Although the values and the positions of the specified bits are highly irregular, the proposed technique can still flexibly and precisely fill in these fully specified bits through utilizing a linear network. The linear property inherent in the compression strategy in turn enables the development of an extremely low-overhead decompression engine. At runtime, the decompressed code can be generated in such a way that all the specified bits can be filled as required by a fixed-bandwidth XOR network. The combination of the proposed flexible XOR-based network with a minimum two-level storage for highly specified fields, such as immediate values, offers utmost code compression, attained within a negligible amount of performance and hardware overhead.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. R. Rau, D. Cronquist, and M. Sivaraman, "PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators." VLSI Signal Processing, vol. 31, no. 2, pp. 127--142, 2002.
|
| |
2
|
N. Clark, H. Zhong, K. Fan, S. Mahlke, K. Flautner, and K. V. Nieuwenhove, "OptimoDE: Programmable accelerator engines through retargetable customization," in Hot Chips, 2004.
|
| |
3
|
S. Weber and K. Keutzer, "Using minimal minterms to represent programmability," in CODES+ISSS, Sept. 2005, pp. 63--68.
|
| |
4
|
M. Reshadi, B. Gorjiara, and D. Gajski, "Utilizing horizontal and vertical parallelism with a no-instruction--set compiler for custom datapaths," in ICCD, Oct. 2005, pp. 69--76.
|
| |
5
|
M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson-Edefors, and P. Stenstrom, "FlexCore: Utilizing exposed datapath control for efficient computing," in IC-SAMOS, July 2007, pp. 18--25.
|
| |
6
|
A. Wolfe and A. Chanin, "Executing compressed programs on an embedded RISC architecture," in Microarchitecture, Dec 1992, pp. 81--91.
|
| |
7
|
T. M. Kemp, R. K. Montoye, J. D. Harper, J. D. Palmer, and D. J. Auerbach, "A decompression core for PowerPC," IBM Journal of Research and Development, vol. 42, no. 6, pp. 807--812, 1998.
|
| |
8
|
K. D. Cooper and N. McIntosh, "Enhanced code compression for embedded risc processors," in Programming Language Design and Implementation, 1999.
|
| |
9
|
S. K. Debray, W. Evans, R. Muth, and B. D. Sutter, "Compiler techniques for code compaction," ACM Trans. on Programming Languages and Systems, vol. 22, no. 2, 2000.
|
| |
10
|
S. Segars, K. Clarke, and L. Goudge, "Embedded control problems, thumb, and the ARM7TDMI," IEEE Micro, vol. 15, no. 5, pp. 22--30, 1995.
|
| |
11
|
R. Grehan, "16-bit: The good, the bad, your options," in Embedded Systems Programming, vol. 12, no. 8, 1999.
|
| |
12
|
G. G. Pechanek, S. Larin, and T. Conte, "Any-size instruction abbreviation technique for embedded DSPs," in ASIC/SOC Conference, Sept. 2002, pp. 8--12.
|
| |
13
|
B. Gorjiara and D. Gajski, "FPGA-friendly code compression for horizontal microcoded custom IPs," in FPGA'07, 2007, pp. 108--115.
|
| |
14
|
E. Borin, M. Breternitz, Y. Wu, and G. Araujo, "Clustering-based microcode compression," in ICCD'07, Oct. 2007, pp. 189--196.
|
| |
15
|
M. Thuresson, M. Sjalander, and P. Stenstrom, "A flexible code compression scheme using partitioned look-up tables," in HiPEAC, Jan. 2009, pp. 95--109.
|
| |
16
|
G. Stewart, Introduction to Matrix Computations. Acadamic Press, 1973.
|
| |
17
|
I. Bayraktaroglu and A. Orailoglu, "The construction of optimal deterministic partitionings in scan-based BIST fault diagnosis: Mathematical foundations and cost-effective implementations," IEEE Trans. Computers, vol. 54, no. 1, pp. 61--75, 2005.
|
| |
18
|
D. Kim, K. Lee, S.-J. Lee, and H.-J. Yoo, "A reconfigurable crossbar switch with adaptive bandwidth control for networks-on-chip," in ISCAS, Jan. 2005, pp. 2369--2372.
|
| |
19
|
M. Wan, H. Zhang, V. George, M. Benes, A. Abnous, V. Prabhu, and J. Rabaey, "Design methodology of a low-energy reconfigurable single-chip DSP system," Journal of VLSI Signal Processing Systems, vol. 28, pp. 47--61, 2001.
|
| |
20
|
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi, "CACTI 5.1," Tech. report, HP Labs, April 2008.
|
|