ACM Home Page
Please provide us with feedback. Feedback
A first look at the interplay of code reordering and configurable caches
Full text PdfPdf (725 KB)
Source Great Lakes Symposium on VLSI archive
Proceedings of the 15th ACM Great Lakes symposium on VLSI table of contents
Chicago, Illinois, USA
POSTER SESSION: Poster session 2 table of contents
Pages: 416 - 421  
Year of Publication: 2005
ISBN:1-59593-057-4
Authors
Ann Gordon-Ross  University of California, Riverside, CA
Frank Vahid  University of California, Riverside, CA
Nikil Dutt  University of California, Irvine, CA
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 16,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1057661.1057760
What is a DOI?

ABSTRACT

The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache's high impact on system performance and power, and because of the cache's predictable temporal and spatial locality. Optimization techniques can be designed based on this predictability. We explore for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration. We address the question of whether those two optimizations complement each other or if one optimization dominates the other. Through experiments using embedded system benchmarks, we show that cache configuration dominates a particular category of code reordering techniques with respect to optimizing performance and energy, obviating the need for reordering. We also examine the modern scenario of synthesized custom caches, and show that combining cache configuration with code reordering results in cache size reductions of 13% on average, and up to 89% in some benchmarks, beyond just cache configuration alone.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Albonesi, D.H. Selective cache ways: on demand cache resource allocation. Journal of Instruction Level Parallelism, May 2002.
 
2
Altera, Nios Embedded Processor System Development, http://www.altera.com/corporate/news_room/releases/products/nr-nios_delivers_goods.html
 
3
Arc International, www.arccores.com
 
4
ARM, www.arm.com
5
 
6
Cohn, R., Goodwin, P., Lowney, G., Rubin, N. Spike: an optimizer for Alpha/NT executables. in USENIX Windows NT Workshop, August 1997.
 
7
Cohn. R., Lowney, P.G. Design and analysis of profile-based optimization in Compaq's compilation tools for Alpha. Journal of Instruction Level Parallelism, vol. 2, May 2000.
 
8
Dinero IV, http://www.cs.wisc.edu/~markhill/DineroIV/
 
9
EEMBC, the Embedded Microprocessor Benchmark Consortium, www.eembc.org.
 
10
 
11
 
12
13
 
14
Lee, D., Baer, J., Bershad, B., Anderson, T. Reducing startup latency in web and desktop applications. In Windows NT Symposium, July 1999.
 
15
16
17
 
18
MIPS Technologies, www.mips.com
 
19
Moseley, P., Debray, S., Andrews, G. Checking program profiles. Third IEEE International Workshop of Source Code Analysis and Manipulation, September 2003.
 
20
21
 
22
Reinman, G., Jouppi, N.P. Cacti2.0: an integraded cache timing and power model. COMPAQ Western Research Lab, 1999.
 
23
Romer, T., Voelker, G., Lee, D., Wolman, A., Wong, W., Levy, H., Bershad, B., Chen, B. Instrumentation and optimization of Win32/Intel executables using ETCH. In USENIX Windows NT Workshop, August 1997.
 
24
 
25
Scales, D.J. Efficient dynamic procedure placement. Technical Report WRL-98/5, Compaq WRL Research Lab, May 1998.
 
26
Scales, D.J., Randall, K.H, Ghemawat, S., Dean, J. The swift java compiler: design and implementation. Technical Report 2000/2, Compaq Western Research Laboratory, Apr. 2000.
 
27
Scharz, B., Debray, S., Andrews, G., Legendre, M. PLTO: a link-time optimizer for the Intel IA-32 architecture. Proc. 2001 Workshop on Binary Translation (WBT-2001), Sept. 2001.
 
28
Silicon Graphics Inc, Cord manual page. IRIX 5.3.
 
29
Srivastava, A., Wall,D. A practical system for intermodule code optimization at link-time. Technical Report 92/6. Digital Western Rearch Labrartory. June 1992.
30
 
31
Tensilica, Xtensa Processor Generator, http://www.tensilica.com/.
 
32
33
 
34

Collaborative Colleagues:
Ann Gordon-Ross: colleagues
Frank Vahid: colleagues
Nikil Dutt: colleagues