|
ABSTRACT
The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache's high impact on system performance and power, and because of the cache's predictable temporal and spatial locality. Optimization techniques can be designed based on this predictability. We explore for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration. We address the question of whether those two optimizations complement each other or if one optimization dominates the other. Through experiments using embedded system benchmarks, we show that cache configuration dominates a particular category of code reordering techniques with respect to optimizing performance and energy, obviating the need for reordering. We also examine the modern scenario of synthesized custom caches, and show that combining cache configuration with code reordering results in cache size reductions of 13% on average, and up to 89% in some benchmarks, beyond just cache configuration alone.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Albonesi, D.H. Selective cache ways: on demand cache resource allocation. Journal of Instruction Level Parallelism, May 2002.
|
| |
2
|
Altera, Nios Embedded Processor System Development, http://www.altera.com/corporate/news_room/releases/products/nr-nios_delivers_goods.html
|
| |
3
|
Arc International, www.arccores.com
|
| |
4
|
ARM, www.arm.com
|
 |
5
|
|
| |
6
|
Cohn, R., Goodwin, P., Lowney, G., Rubin, N. Spike: an optimizer for Alpha/NT executables. in USENIX Windows NT Workshop, August 1997.
|
| |
7
|
Cohn. R., Lowney, P.G. Design and analysis of profile-based optimization in Compaq's compilation tools for Alpha. Journal of Instruction Level Parallelism, vol. 2, May 2000.
|
| |
8
|
Dinero IV, http://www.cs.wisc.edu/~markhill/DineroIV/
|
| |
9
|
EEMBC, the Embedded Microprocessor Benchmark Consortium, www.eembc.org.
|
| |
10
|
|
| |
11
|
Nikolas Gloy , Trevor Blackwell , Michael D. Smith , Brad Calder, Procedure placement using temporal ordering information, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.303-313, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
12
|
|
 |
13
|
|
| |
14
|
Lee, D., Baer, J., Bershad, B., Anderson, T. Reducing startup latency in web and desktop applications. In Windows NT Symposium, July 1999.
|
| |
15
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
16
|
|
 |
17
|
|
| |
18
|
MIPS Technologies, www.mips.com
|
| |
19
|
Moseley, P., Debray, S., Andrews, G. Checking program profiles. Third IEEE International Workshop of Source Code Analysis and Manipulation, September 2003.
|
| |
20
|
|
 |
21
|
|
| |
22
|
Reinman, G., Jouppi, N.P. Cacti2.0: an integraded cache timing and power model. COMPAQ Western Research Lab, 1999.
|
| |
23
|
Romer, T., Voelker, G., Lee, D., Wolman, A., Wong, W., Levy, H., Bershad, B., Chen, B. Instrumentation and optimization of Win32/Intel executables using ETCH. In USENIX Windows NT Workshop, August 1997.
|
| |
24
|
|
| |
25
|
Scales, D.J. Efficient dynamic procedure placement. Technical Report WRL-98/5, Compaq WRL Research Lab, May 1998.
|
| |
26
|
Scales, D.J., Randall, K.H, Ghemawat, S., Dean, J. The swift java compiler: design and implementation. Technical Report 2000/2, Compaq Western Research Laboratory, Apr. 2000.
|
| |
27
|
Scharz, B., Debray, S., Andrews, G., Legendre, M. PLTO: a link-time optimizer for the Intel IA-32 architecture. Proc. 2001 Workshop on Binary Translation (WBT-2001), Sept. 2001.
|
| |
28
|
Silicon Graphics Inc, Cord manual page. IRIX 5.3.
|
| |
29
|
Srivastava, A., Wall,D. A practical system for intermodule code optimization at link-time. Technical Report 92/6. Digital Western Rearch Labrartory. June 1992.
|
 |
30
|
|
| |
31
|
Tensilica, Xtensa Processor Generator, http://www.tensilica.com/.
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
|