|
ABSTRACT
This paper presents a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in energy consumption, area and overall runtime, even with a simple allocation scheme [4].Existing scratch-pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitionsm variables at compile-time into the two banks. For example, our previous work in [3] derives a provably optimal static allocation for global and stack variables and achieves a speedup over all earlier methods. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache.In this paper we present a dynamic allocation method for global and stack data that for the first time, (i) accounts for changing program requirements at runtime (ii) has no software-caching tags (iii) requires no run-time checks (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation our results show runtime reductions ranging from 11% to 38%, averaging 31.2%, using no additional hardware support. With hardware support for pseudo-DMA and full DMA, which is already provided in some commercial systems, the runtime reductions increase to 33.4% and 34.2% respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Oren Avissar , Rajeev Barua , Dave Stewart, Heterogeneous memory management for embedded systems, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502223]
|
 |
3
|
|
 |
4
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
| |
5
|
L. A. Belady. A study of replacement algorithms for virtual storage. In IBM Systems Journal, pages 5: 78--101, 1966.
|
 |
6
|
John K. Bennett , John B. Carter , Willy Zwaenepoel, Adaptive software cache management for distributed shared memory architectures, Proceedings of the 17th annual international symposium on Computer Architecture, p.125-134, May 28-31, 1990, Seattle, Washington, United States
|
| |
7
|
A. Bestavros , R. L. Carter , M. E. Crovella , C. R. Cunha , A. Heddaya , S. A. Mirdad, Application-level document caching in the Internet, Proceedings of the 2nd International Workshop on Services in Distributed and Networked Environments, p.166, June 05-08, 1995
|
| |
8
|
David Brash. The ARM architecture Version 6 (ARMv6). ARM Ltd., January 2002. White Paper.
|
| |
9
|
|
| |
10
|
Document No. ARM DDI 0084D, ARM Ltd. ARM7TDMI-S Data sheet, October 1998.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
ILOG Corporation. The CPLEX optimization suite. http://www.ilog.com/products/cplex/, 2001.
|
| |
15
|
|
| |
16
|
|
| |
17
|
Matlab 6.1. The Math Works, Inc., 2001. http://www.mathworks.com/products/matlab/.
|
 |
18
|
M. Kandemir , J. Ramanujam , J. Irwin , N. Vijaykrishnan , I. Kadayif , A. Parikh, Dynamic management of scratch-pad memory space, Proceedings of the 38th conference on Design automation, p.690-695, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379049]
|
| |
19
|
|
| |
20
|
CPU12 Reference Manual. Motorola Corporation, 2000. (A 16-bit processor). http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16 BIT/68HC12FAMILY/REFMAT/CPU12RM.pdf.
|
| |
21
|
M-CORE - MMC2001 Reference Manual. Motorola Corporation, 1998. (A 32-bit processor). http://www.motorola.com/SPS/MCORE/infodocumentation.htm.
|
 |
22
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.297-306, October 05-07, 1994, San Jose, California, United States
|
| |
23
|
Jan Sjodin, Bo Froderberg, and Thomas Lindgren. Allocation of Global Data Objects in On-Chip RAM. Compiler and Architecture Support for Embedded Computing Systems, December 1998.
|
 |
24
|
Jan Sjödin , Carl von Platen, Storage allocation for embedded processors, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502221]
|
| |
25
|
Osman S. Unsal , Raksit Ashok , Israel Koren , C. Mani Krishna , Csaba Andras Moritz, Cool-cache for hot multimedia, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
26
|
TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.
|
CITED BY 48
|
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Memory allocation for embedded systems with a compile-time-unknown scratch-pad size, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
Bernhard Egger , Chihun Kim , Choonki Jang , Yoonsung Nam , Jaejin Lee , Sang Lyul Min, A dynamic code placement technique for scratchpad memory using postpass optimization, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
O. Ozturk , M. Kandemir , G. Chen , M. J. Irwin , M. Karakoy, Customized on-chip memories for embedded chip multiprocessors, Proceedings of the 2005 conference on Asia South Pacific design automation, January 18-21, 2005, Shanghai, China
|
|
|
O. Ozturk , M. Kandemir , I. Demirkiran , G. Chen , M. J. Irwin, Data compression for improving SPM behavior, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
|
|
|
Federico Angiolini , Francesco Menichelli , Alberto Ferrero , Luca Benini , Mauro Olivieri, A post-compiler approach to scratchpad mapping of code, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
|
|
|
Surupa Biswas , Matthew Simpson , Rajeev Barua, Memory overflow protection for embedded systems using run-time checks, reuse and compression, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
|
|
|
M. Kandemir , O. Ozturk , M. Karakoy, Dynamic on-chip memory management for chip multiprocessors, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
|
|
|
|
|
|
|
|
|
|
|
|
Michael K. Chen , Xiao Feng Li , Ruiqi Lian , Jason H. Lin , Lixia Liu , Tao Liu , Roy Ju, Shangri-La: achieving high performance from compiled network applications while enabling ease of programming, ACM SIGPLAN Notices, v.40 n.6, June 2005
|
|
|
Ali El-Haj-Mahmoud , Ahmed S. AL-Zawawi , Aravindh Anantaraman , Eric Rotenberg, Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
Ilya Issenin , Erik Brockmeyer , Bart Durinck , Nikil Dutt, Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies, Proceedings of the 43rd annual conference on Design automation, July 24-28, 2006, San Francisco, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Surupa Biswas , Thomas Carley , Matthew Simpson , Bhuvan Middha , Rajeev Barua, Memory overflow protection for embedded systems using run-time checks, reuse, and compression, ACM Transactions on Embedded Computing Systems (TECS), v.5 n.4, p.719-752, November 2006
|
|
|
|
|
|
|
|
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Scratch-pad memory allocation without compiler support for java applications, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
Doosan Cho , Ilya Issenin , Nikil Dutt , Jonghee W. Yoon , Yunheung Paek, Software controlled memory layout reorganization for irregular array access patterns, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Angel Dominguez , Nghi Nguyen , Rajeev K. Barua, Recursive function data allocation to scratch-pad memory, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
Alexandros Bartzas , Miguel Peon-Quiros , Stylianos Mamagkakis , Francky Catthoor , Dimitrios Soudris , Jose M. Mendias, Enabling run-time memory data transfer optimizations at the system level with automated extraction of embedded software metadata information, Proceedings of the 2008 conference on Asia and South Pacific design automation, January 21-24, 2008, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Doosan Cho , Sudeep Pasricha , Ilya Issenin , Nikil D. Dutt , Minwook Ahn , Yunheung Paek, Adaptive scratch pad memory management for dynamic behavior of multimedia applications, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, v.28 n.4, p.554-567, April 2009
|
|
|
|
|
|
Alexandros Bartzas , Miguel Peon-Quiros , Stylianos Mamagkakis , Francky Catthoor , Dimitrios Soudris , Jose M. Mendias, Direct memory access usage optimization in network applications for reduced memory latency and energy consumption, Journal of Embedded Computing, v.3 n.3, p.241-254, August 2009
|
|