|
ABSTRACT
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache with a scratchpad memory (SPM) and a small minicache. Serializing the address translation with the actual memory access enables the memory system to access either only the SPM or the minicache. Independent of the SPM size and based solely on profiling information, a postpass optimizer classifies the code of an application binary into a pageable and a cacheable code region. The latter is placed at a fixed location in the external memory and cached by the minicache. The former, the pageable code region, is copied on demand to the SPM before execution. Both the pageable code region and the SPM are logically divided into pages the size of an MMU memory page. Using the MMU's pagefault exception mechanism, a runtime scratchpad memory manager (SPMM) tracks page accesses and copies frequently executed code pages to the SPM before they get executed. In order to minimize the number of page transfers from the external memory to the SPM, good code placement techniques become more important with increasing sizes of the MMU pages. We discuss code-grouping techniques and provide an analysis of the effect of the MMU's page size on execution time, energy consumption, and external memory accesses. We show that by using the data cache as a victim buffer for the SPM, significant energy savings are possible. We evaluate our SPM allocation strategy with fifteen applications, including H.264, MP3, MPEG-4, and PGP. The proposed memory system requires 8&percent; less die are compared to a fully-cached configuration. On average, we achieve a 31&percent; improvement in runtime performance and a 35&percent; reduction in energy consumption with an MMU page size of 256 bytes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Federico Angiolini , Luca Benini , Alberto Caprara, Polynomial-time algorithm for on-chip scratchpad memory partitioning, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951751]
|
 |
2
|
Federico Angiolini , Francesco Menichelli , Alberto Ferrero , Luca Benini , Mauro Olivieri, A post-compiler approach to scratchpad mapping of code, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
[doi> 10.1145/1023833.1023869]
|
| |
3
|
ARM926EJ-S 2002. ARM926EJ-S Jazelle-enhanced macrocell,. http://www.arm.com/products/CPUs/ARM926EJ-S.html.
|
| |
4
|
ARMv6 2002. ARM Architecture Version 6 (ARMv6),. http://www.arm.com.
|
 |
5
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
 |
6
|
Hyungmin Cho , Bernhard Egger , Jaejin Lee , Heonshik Shin, Dynamic data scratchpad memory management for a memory subsystem with an MMU, Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 13-15, 2007, San Diego, California, USA
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
Bernhard Egger , Chihun Kim , Choonki Jang , Yoonsung Nam , Jaejin Lee , Sang Lyul Min, A dynamic code placement technique for scratchpad memory using postpass optimization, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
[doi> 10.1145/1176760.1176788]
|
 |
11
|
|
 |
12
|
|
| |
13
|
M. R. Guthaus , J. S. Ringenberg , D. Ernst , T. M. Austin , T. Mudge , R. B. Brown, MiBench: A free, commercially representative embedded benchmark suite, Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, p.3-14, December 02-02, 2001
[doi> 10.1109/WWC.2001.15]
|
| |
14
|
H.264 2003. H.264 Video Codec. http://www.itu.int/rec/T-REC-H.264.
|
| |
15
|
Intel IXP Network Processor 2002. The Intel IXP Network Processor,. http://developer.intel.com/technology/itj/2002/volume06issue03/.
|
| |
16
|
Intel XScale 2002. Intel XScale Architecture. http://www.intel.com.
|
| |
17
|
|
 |
18
|
|
 |
19
|
M. Kandemir , J. Ramanujam , J. Irwin , N. Vijaykrishnan , I. Kadayif , A. Parikh, Dynamic management of scratch-pad memory space, Proceedings of the 38th conference on Design automation, p.690-695, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379049]
|
| |
20
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
21
|
|
 |
22
|
Philip Machanick , Pierre Salverda , Lance Pompe, Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.105-114, October 02-07, 1998, San Jose, California, United States
|
| |
23
|
Micron Technology, Inc. 2003. MT48H8M16LF Mobile SDRAM.
|
| |
24
|
Micron Technology, Inc. 2004. Mobile SDRAM Power Calc 10.
|
| |
25
|
Moussouris, J., Crudele, L., Freitas, D., Hansen, C., Hudson, E., Przybylski, S., Riordan, T., and Rowen, C. 1986. A cmos risc processor with integrated system functions. In COMPCON.
|
| |
26
|
MP3 1996. MP3 Reference Decoder. http://www.mp3-tech.org/programmer/sources/dist10.tgz.
|
 |
27
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Memory allocation for embedded systems with a compile-time-unknown scratch-pad size, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
[doi> 10.1145/1086297.1086313]
|
| |
28
|
|
 |
29
|
Chanik Park , Junghee Lim , Kiwon Kwon , Jaejin Lee , Sang Lyul Min, Compiler-assisted demand paging for embedded systems with flash memory, Proceedings of the 4th ACM international conference on Embedded software, September 27-29, 2004, Pisa, Italy
[doi> 10.1145/1017753.1017775]
|
 |
30
|
|
| |
31
|
PGPi. 2002. Pretty Good Privacy (PGPi). http://www.pgpi.org/.
|
| |
32
|
Philips LPC3180 2006. Philips LPC3180 microcontroller. http://www.standardics.philips.com/.
|
 |
33
|
Poletti Francesco , Paul Marchal , David Atienza , Luca Benini , Francky Catthoor , Jose M. Mendias, An integrated hardware/software approach for run-time scratchpad management, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996634]
|
| |
34
|
Samsung Semiconductor. 2005. K4X51163PC Mobile DDR SRAM.
|
 |
35
|
Aviral Shrivastava , Ilya Issenin , Nikil Dutt, Compilation techniques for energy reduction in horizontally partitioned cache architectures, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
[doi> 10.1145/1086297.1086310]
|
| |
36
|
SNACK 2004. Seoul National University Advanced Compiler Tool Kit. http://aces.snu.ac.kr/snack.html.
|
 |
37
|
Stefan Steinke , Nils Grunwald , Lars Wehmeyer , Rajeshwari Banakar , M. Balakrishnan , Peter Marwedel, Reducing energy consumption by dynamic copying of instructions onto onchip memory, Proceedings of the 15th international symposium on System Synthesis, October 02-04, 2002, Kyoto, Japan
[doi> 10.1145/581199.581247]
|
 |
38
|
|
| |
39
|
|
| |
40
|
Verma, M., Petzold, K., Wehmeyer, L., Falk, H., and Marvedel, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In 3rd Workshop on Embedded Systems for Real-Time Multimedia.
|
| |
41
|
Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5, 677--688.
|
| |
42
|
Xvid 2005. Xvid MPEG-4 Video Codec. http://www.xvid.org.
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.4
PERFORMANCE OF SYSTEMS
Additional Classification:
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Optimization;
Compilers;
Code generation
D.4
OPERATING SYSTEMS
D.4.2
Storage Management
Subjects:
Virtual memory;
Storage hierarchies;
Secondary storage
General Terms:
Algorithms,
Design,
Experimentation,
Management,
Measurement,
Performance
Keywords:
Code placement,
compilers,
heterogeneous memory,
paging,
portable systems,
postpass optimization,
scratchpad,
victim cache,
virtual memory
|