|
ABSTRACT
This paper presents the first automatic scheme to allocate local (stack) data in recursive functions to scratch-pad memory (SPM) in embedded systems. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its significantly lower access time, energy consumption, real-time bounds, area and overall runtime. Existing compiler methods for allocating data to scratch-pad are able to place only code, global, heap and non-recursive stack data in scratch-pad memory; stack data for recursive functions is allocated entirely in DRAM, resulting in poor performance. In this paper we present a dynamic yet compiler-directed allocation method for recursive function stack data that for the first time, is able to place a portion of recursive stack data in scratch-pad. It has almost no software-caching overhead, and is able to move recursive function data back and forth between scratch-pad and DRAM to better track the program's locality characteristics. With our method, all code, global, stack and heap variables can share the same scratch-pad. When compared to placing all recursive function data in DRAM and all other variables in scratch-pad, our results show that our method reduces the average runtime of our benchmarks by 29.3%, and the average power consumption by 31.1%, for the same size of scratch-pad fixed at5% of total data size. Furthermore,significant savings were observedwhen comparing our method against cache-based alternatives for SPM allocation. Finally, we show results that analyze the effects of profile variation on our allocation approach and present a modified version of our method which minimizes variation for profile-based allocations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Adiletta, M. Rosenbluth, D. Bernstein, G. Wolrich, and H. Wilkinson. The Next Generation of Intel IXP Network Processors. Intel Technology Journal, 6(3), Aug. 2002. http://developer.intel.com/technology/itj/2002/volume06issue03/.
|
 |
2
|
Federico Angiolini , Francesco Menichelli , Alberto Ferrero , Luca Benini , Mauro Olivieri, A post-compiler approach to scratchpad mapping of code, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
[doi> 10.1145/1023833.1023869]
|
 |
3
|
Oren Avissar , Rajeev Barua , Dave Stewart, Heterogeneous memory management for embedded systems, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502223]
|
 |
4
|
|
 |
5
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
 |
6
|
|
| |
7
|
D. Brash. The ARM architecture Version 6 (ARMv6). ARM Ltd., January 2002. White Paper.
|
 |
8
|
|
| |
9
|
Cacti 3.2. P. Shivaumar and N. P. Jouppi, Revised 2004. http://research.compaq.com/wrl/people/jouppi/CACTI.html.
|
| |
10
|
|
| |
11
|
GNU. GNU Compiler Collection. Cambridge, Massachusetts, USA, http://gcc.gnu.org/, 2006. Also available at http://gcc.gnu.org/.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
Jason D. Hiser , Jack W. Davidson, EMBARC: an efficient memory bank assignment algorithm for retargetable compilers, Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 11-13, 2004, Washington, DC, USA
|
| |
16
|
Intel. Intel StrongARM SA1110 Embedded Procesor, 2000. http://developer.intel.com/design/pca/applications-- processors/1110 brf.htm.
|
| |
17
|
J. Janzen. Calculating Memory System Power for DDR DRAM. In DesignLine Journal, volume 10(2). Micron Technology Inc., 2001. http://www.micron.com/publications/designline.html.
|
 |
18
|
|
 |
19
|
M. Kandemir , J. Ramanujam , J. Irwin , N. Vijaykrishnan , I. Kadayif , A. Parikh, Dynamic management of scratch-pad memory space, Proceedings of the 38th conference on Design automation, p.690-695, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379049]
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
128Mb DDR SDRAM data sheet. (Dual data-rate synchronous DRAM) Micron Technology Inc., 2003. http://www.micron.com/products/dram/ddrsdram/.
|
 |
24
|
M. Kandemir , J. Ramanujam , J. Irwin , N. Vijaykrishnan , I. Kadayif , A. Parikh, Dynamic management of scratch-pad memory space, Proceedings of the 38th conference on Design automation, p.690-695, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379049]
|
| |
25
|
|
| |
26
|
M-CORE-MMC2001 Reference Manual. Motorola Corporation, 1998. (A 32-bit processor). http://www.motorola.com/SPS/MCORE/-info documentation.htm.
|
 |
27
|
|
| |
28
|
Compilation Challenges for Network Processors. Industrial Panel, ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), June 2003. Slides at http://www.cs.purdue.edu/s3/LCTES03/.
|
 |
29
|
|
| |
30
|
J. Sjodin, B. Froderberg, and T. Lindgren. Allocation of Global Data Objects in On-Chip RAM. Compiler and Architecture Support for Embedded Computing Systems, December 1998.
|
 |
31
|
Jan Sjödin , Carl von Platen, Storage allocation for embedded processors, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502221]
|
| |
32
|
|
 |
33
|
|
| |
34
|
TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://wwws.ti.com/sc/psheets/spns034c/spns034c.pdf.
|
 |
35
|
|
| |
36
|
DineroIV Cache simulator. J. Edler and M. D. Hill, Revised 2004. http://www.cs.wisc.edu/ markhill/DineroIV/.
|
| |
37
|
|
 |
38
|
|
 |
39
|
|
| |
40
|
L. Wehmeyer and P. Marwedel. Influence of onchip scratchpad memories on wcet prediction. In Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis, 2004.
|
| |
41
|
S. Wilton and N. Jouppi. Cacti: An enhanced cache access and cycle time model. In IEEE Journal of Solid-State Circuits, 1996.
|
|