ACM Home Page
Please provide us with feedback. Feedback
An optimal memory allocation scheme for scratch-pad-based embedded systems
Full text PdfPdf (397 KB)
Source ACM Transactions on Embedded Computing Systems (TECS) archive
Volume 1 ,  Issue 1  (November 2002) table of contents
Pages: 6 - 26  
Year of Publication: 2002
ISSN:1539-9087
Authors
Oren Avissar  University of Maryland, College Park, MD
Rajeev Barua  University of Maryland, College Park, MD
Dave Stewart  Embedded Research Solutions, Columbia, MD
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 224,   Citation Count: 44
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/581888.581891
What is a DOI?

ABSTRACT

This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
Bhattacharyya, S. S., Leupers, R., and Marwedel, P. 2000. Software synthesis and code generation for signal processing systems. IEEE Trans. Circuits Syst. 47, 9 (Sept.).
 
5
Consortium, T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.
 
6
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. Available at http://www.eecs.umich.edu/jringenb/mibench/.
7
 
8
 
9
 
10
Matlab 6.1. The Math Works, Inc., 2001. http://www.mathworks.com/products/matlab/.
 
11
 
12
CPU12 Reference Manual. Motorola Corporation, 2000. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16_BIT/68HC12_FAMILY/REF_MAT/CPU12RM.pdf.
 
13
M-CORE---MMC2001 Reference Manual. Motorola Corporation, 1998. http://www.motorola. com/SPS/MCORE/info_documentation.htm.
 
14
New York City, Office of Budget and Management. 1999. Website on frequently asked questions on linear programming. http://www.eden.rutgers.edu/∼pil/FAQ.html. New York, NY.
 
15
University of Toronto Digital Signal Processing (UTDSP). 1992. University of Toronto Digital Signal Processing (UTDSP) Benchmark Suite. Available at http://www.eecg.toronto.edu/.
16
 
17
Paulin, P., Liem, C., Cornero, M., Nacabal, F., and Goossens, G. 1997. Embedded software in real-time signal processing systems: Application and architecture trends. Invited paper, Proc. IEEE 85, 3 (Mar.).
 
18
Rutter, P., Orost, J., and Gloistein, D. BTOA: Binary to printable ASCII converter source code. Available at http://www.bookcase.com/library/software/msdos.devel.lang.c.html.
 
19
Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems. Dec.
20
 
21
TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.

CITED BY  44

Collaborative Colleagues:
Oren Avissar: colleagues
Rajeev Barua: colleagues
Dave Stewart: colleagues