|
ABSTRACT
This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
| |
3
|
|
| |
4
|
Bhattacharyya, S. S., Leupers, R., and Marwedel, P. 2000. Software synthesis and code generation for signal processing systems. IEEE Trans. Circuits Syst. 47, 9 (Sept.).
|
| |
5
|
Consortium, T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.
|
| |
6
|
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. Available at http://www.eecs.umich.edu/jringenb/mibench/.
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
Matlab 6.1. The Math Works, Inc., 2001. http://www.mathworks.com/products/matlab/.
|
| |
11
|
|
| |
12
|
CPU12 Reference Manual. Motorola Corporation, 2000. http://e-www.motorola.com/brdata/PDFDB/MICROCONTROLLERS/16_BIT/68HC12_FAMILY/REF_MAT/CPU12RM.pdf.
|
| |
13
|
M-CORE---MMC2001 Reference Manual. Motorola Corporation, 1998. http://www.motorola. com/SPS/MCORE/info_documentation.htm.
|
| |
14
|
New York City, Office of Budget and Management. 1999. Website on frequently asked questions on linear programming. http://www.eden.rutgers.edu/∼pil/FAQ.html. New York, NY.
|
| |
15
|
University of Toronto Digital Signal Processing (UTDSP). 1992. University of Toronto Digital Signal Processing (UTDSP) Benchmark Suite. Available at http://www.eecg.toronto.edu/.
|
 |
16
|
|
| |
17
|
Paulin, P., Liem, C., Cornero, M., Nacabal, F., and Goossens, G. 1997. Embedded software in real-time signal processing systems: Application and architecture trends. Invited paper, Proc. IEEE 85, 3 (Mar.).
|
| |
18
|
Rutter, P., Orost, J., and Gloistein, D. BTOA: Binary to printable ASCII converter source code. Available at http://www.bookcase.com/library/software/msdos.devel.lang.c.html.
|
| |
19
|
Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems. Dec.
|
 |
20
|
Jan Sjödin , Carl von Platen, Storage allocation for embedded processors, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502221]
|
| |
21
|
TMS370Cx7x 8-bit microcontroller. Texas Instruments, Revised Feb. 1997. http://www-s.ti.com/sc/psheets/spns034c/spns034c.pdf.
|
CITED BY 44
|
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Memory allocation for embedded systems with a compile-time-unknown scratch-pad size, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
|
|
|
Bernhard Egger , Chihun Kim , Choonki Jang , Yoonsung Nam , Jaejin Lee , Sang Lyul Min, A dynamic code placement technique for scratchpad memory using postpass optimization, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peter Marwedel , Lars Wehmeyer , Manish Verma , Stefan Steinke , Urs Helmig, Fast, predictable and low energy memory references through architecture-aware compilation, Proceedings of the 2004 conference on Asia South Pacific design automation: electronic design and solution fair, p.4-11, January 27-30, 2004, Yokohama, Japan
|
|
|
Chanik Park , Junghee Lim , Kiwon Kwon , Jaejin Lee , Sang Lyul Min, Compiler-assisted demand paging for embedded systems with flash memory, Proceedings of the 4th ACM international conference on Embedded software, September 27-29, 2004, Pisa, Italy
|
|
|
|
|
|
|
|
|
Michael K. Chen , Xiao Feng Li , Ruiqi Lian , Jason H. Lin , Lixia Liu , Tao Liu , Roy Ju, Shangri-La: achieving high performance from compiled network applications while enabling ease of programming, ACM SIGPLAN Notices, v.40 n.6, June 2005
|
|
|
|
|
|
|
|
|
Hae-woo Park , Kyoungjoo Oh , Soyoung Park , Myoung-min Sim , Soonhoi Ha, Dynamic code overlay of SDF-modeled programs on low-end embedded systems, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
Shane Ryoo , Christopher I. Rodrigues , Sam S. Stone , Sara S. Baghsorkhi , Sain-Zee Ueng , John A. Stratton , Wen-mei W. Hwu, Program optimization space pruning for a multithreaded gpu, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Doosan Cho , Ilya Issenin , Nikil Dutt , Jonghee W. Yoon , Yunheung Paek, Software controlled memory layout reorganization for irregular array access patterns, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Scratch-pad memory allocation without compiler support for java applications, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Angel Dominguez , Nghi Nguyen , Rajeev K. Barua, Recursive function data allocation to scratch-pad memory, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
Ben Lickly , Isaac Liu , Sungjun Kim , Hiren D. Patel , Stephen A. Edwards , Edward A. Lee, Predictable programming on a precision timed architecture, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|