ACM Home Page
Please provide us with feedback. Feedback
Instruction cache locking inside a binary rewriter
Full text PdfPdf (386 KB)
Source
International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems table of contents
Grenoble, France
SESSION: Microfluidics, worst-case execution time, and cache optimization table of contents
Pages 185-194  
Year of Publication: 2009
ISBN:978-1-60558-626-7
Authors
Kapil Anand  University of Maryland, College Park, College Park, MD, USA
Rajeev Barua  University of Maryland, College Park, College Park, MD, USA
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 12,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629395.1629422
What is a DOI?

ABSTRACT

Cache memories in embedded systems play an important role in reducing the execution time of the applications. Various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions, thus improving the run-time over a purely hardware-managed cache. Novel embedded systems, like Intel's Xscale and ARM Cortex processors provide the facility of locking one or more lines in cache - this feature is called cache locking. This paper presents the first method in the literature for instruction-cache locking that is able to reduce the average-case run-time of the program. We devise a cost-benefit model to discover the memory addresses which should be locked in the cache. We implement our scheme inside a binary rewriter, thus widening the applicability of our scheme to binaries compiled using any compiler. Results obtained on a suite of MiBench and MediaBench benchmarks show up to 25% improvement in the instruction-cache miss rate on average and up to 13.5% improvement in the execution time on average for applications having instruction accesses as a bottleneck, depending on the cache configuration. The improvement in execution time is as high as 23.5% for some benchmarks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Dinero IV Cache Simulator, http://www.cs.wisc.edu/ markhill/DineroIV.
 
2
ARM1156T2--S TechnicalReference Manual. Arm, Revised July 2007. http://www.arm.com/products/CPUs/families/ARM11Family.html.
 
3
ARM Cortex A-8 Technical reference manual. Arm, Revised March 2004. http://www.arm.com/products/CPUs/families/ARMCortexFamily.html.
 
4
A. Arnaud and I. Puaut. Dynamic instruction cache locking in hard real-time systems. In Proc. of the 14th International Conference on Real-Time and Network Systems (RNTS), Poitiers, France, May 2006.
 
5
O. Avissar, R. Barua, and D. Stewart. An Optimal Memory Allocation Scheme for Scratch-Pad Based Embedded Systems. ACM Transactions on Embedded Systems (TECS), 1(1), September 2002.
 
6
R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad Memory: A Design Alternative for Cache On-chip memory in Embedded Systems. In Tenth International Symposium on Hardware/Software Codesign (CODES), Estes Park, Colorado, May 6--8 2002. ACM.
 
7
K. Beyls and E. H. D'Hollander. Generating cache hints for improved program efficiency. J. Syst. Archit., 51(4):223---250, 2005.
 
8
ADSP-BF533 Blackfin Processor Hardware Reference. AnalogDevices, April 2009. http://www.analog.com/static/importedfiles/processor manuals/bf533 hwr Rev3.4.pdf.
 
9
B. Buck and J. K. Hollingsworth. An api for runtime code patching. Int. J. High Perform. Comput. Appl., 14(4):317--329, 2000.
 
10
A. M. Campoy, A. P. Jimenez, A. P. Ivars, and J. V. B. Mataix. Using genetic algorithms in content selection for locking-caches, 2001.
 
11
D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In DAC '00: Proceedings of the 37th conference on Design automation, pages 416--419, New York, NY, USA, 2000. ACM.
 
12
H. Falk, S. Plazar, and H. Theiling. Compile-time decided instruction cache locking using worst-case execution paths. In CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pages 143--148, New York, NY, USA, 2007. ACM.
 
13
A. Gordon-Ross, S. Cotterell, and F. Vahid. Exploiting fixed programs in embedded systems: A loop cache example. IEEE Comput. Archit. Lett., 1(1):2, 2002.
 
14
P. R. Panda, N. D. Dutt, and A. Nicolau. On-Chip vs. Off-Chip Memory: The Data Partitioning Problem in Embedded Processor-Based Systems. ACM Transactions on Design Automation of Electronic Systems, 5(3), July 2000.
 
15
I. Puaut. Cache analysis vs static cache locking for schedulability analysis in multitasking real-time systems. In Proc. of the 2nd International Workshop on worst-case execution time analysis, in conjunction with the 14th Euromicro Conference on Real-Time Systems, Vienna, Austria, June 2002.
 
16
I. Puaut and D. Decotigny. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proc. of the 23rd IEEE International Real-Time Systems Symposium, Austin, TX, USA, December 2002.
 
17
J. B. Sartor, S. Venkiteswaran, K. S. McKinley, and Z. Wang. Cooperative caching with keep-me and evict-me. In INTERACT '05: Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures, pages 46--57, Washington, DC, USA, 2005. IEEE Computer Society.
 
18
J. Sjodin, B. Froderberg, and T. Lindgren. Allocation of Global Data Objects in On-Chip RAM. Compiler and Architecture Support for Embedded Computing Systems, December 1998.
 
19
S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the conference on Design, automation and test in Europe, page 409. IEEE Computer Society, 2002.
 
20
S. Udayakumaran, A. Dominguez, and R. Barua. Dynamic allocation for scratch--pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst., 5(2):472--511, 2006.
 
21
X. Vera, B. Lisper, and J. Xue. Data cache locking for higher program predictability. In SIGMETRICS '03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 272--282, New York, NY, USA, 2003. ACM.
 
22
M. Verma, L. Wehmeyer, and P. Marwedel. Cache-aware scratchpad allocation algorithm. In Proceedings of the conference on Design, automation and test in Europe, page 21264. IEEE Computer Society, 2004.
 
23
M. Verma, L. Wehmeyer, and P. Marwedel. Dynamic overlay of scratchpad memory for energy minimization. In International conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS). ACM, 2004.
 
24
3rd Generation Intel Xscale Microarchitecture Developer's manual. Intel, May 2007. http://www.intel.com/design/intelxscale/.
 
25
H. Yang, R. Govindarajan, G. R. Gao, and Z. Hu. Improving power efficiency with compiler-assisted cache replacement. J. Embedded Comput., 1(4):487--499, 2005.