|
ABSTRACT
ScratchPad Memories (SPMs) are commonly used in embedded systems because they are more energy-efficient than caches and enable tighter application control on the memory hierarchy. Optimally mapping code and data to SPMs is, however, still a challenge. This paper proposes an optimal scratchpad mapping approach for code segments, which has the distinctive characteristic of working directly on application binaries, thus requiring no access to either the compiler or the application source code - a clear advantage for legacy or proprietary, IP-protected applications.The mapping problem is solved by means of a Dynamic Programming algorithm applied to the execution traces of the target application. The algorithm is able to find the optimal set of instructions blocks to be moved into a dedicated SPM, either minimizing energy consumption or execution times. A patching tool, which can use the output of the optimal mapper, modifies the binary of the application and moves the relevant portions of its code segments to memory locations inside of the SPM.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Raam, F.M.; Agarwal, R.; Malik, K.; Landman, H.A.; Tago, H.; Teruyama, T.; Sakamoto, T.; Yoshida, T.; Yoshioka, S.; Fujimoto, Y.; Kobayashi, T.; Hiroi, T.; Oka, M.; Ohba, A.; Suzuoki, M.; Yutaka, T.; Yamamoto, Y., "A High Bandwidth Superscalar Microprocessor for Multimedia Applications", Digest of Technical Papers of the 1999 IEEE International Solid-State Circuits Conference, pp. 258--259, 1999.
|
| |
2
|
Suzuoki, M.; Kutaragi, K.; Hiroi, T.; Magoshi, H.; Okamoto, S.; Oka, M.; Ohba, A.; Yamamoto, Y.; Furuhashi, M.; Tanaka, M.; Yutaka, T.; Okada, T.; Nagamatsu, M.; Urakawa, Y.; Funyu, M.; Kunimatsu, A.; Goto, H.; Hashimoto, K.; Ide, N.; Murakami, H.; Ohtaguro, Y.; Aono, A., "A Microprocessor with a 128-bit CPU, Ten Floating-Point MAC's, Four Floating-Point Dividers, and an MPEG-2 Decoder", IEEE Journal of Solid-State Circuits, Volume 34 Issue 11, Nov 1999, pp. 1608--1618, 1999.
|
| |
3
|
Koyama, T.; Inoue, K.; Hanaki, H.; Yasue, M.; Iwata, E., "A 250-MHz Single-Chip Multiprocessor for Audio and Video Signal Processing", IEEE Journal of Solid-State Circuits, Volume 36 Issue 11, Nov 2001, pp. 1768--1774, 2001.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
Federico Angiolini , Luca Benini , Alberto Caprara, Polynomial-time algorithm for on-chip scratchpad memory partitioning, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951751]
|
| |
8
|
Kennedy, K.; Allen, J.R., "High-Performance Compilers", Elsevier Science and Technology Books, 2001.
|
| |
9
|
|
| |
10
|
Panda, P.R.; Dutt, N.D.; Nicolau, A., "Local Memory Exploration and Optimization in Embedded Systems", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 18 Issue 1, Jan 1999, pp. 3--13, 1999.
|
| |
11
|
Preeti Ranjan Panda , Nikil D. Dutt , Alexandru Nicolau , Francky Catthoor , Arnout Vandecappelle , Erik Brockmeyer , Chidamber Kulkarni , Eddy De Greef, Data Memory Organization and Optimizations in Application-Specific Systems, IEEE Design & Test, v.18 n.3, p.56-68, May 2001
[doi> 10.1109/54.922803]
|
 |
12
|
Wen-Tsong Shiue , Chaitali Chakrabarti, Memory exploration for low power, embedded systems, Proceedings of the 36th ACM/IEEE conference on Design automation, p.140-145, June 21-25, 1999, New Orleans, Louisiana, United States
[doi> 10.1145/309847.309902]
|
 |
13
|
S. Kim , N. Vijaykrishnan , M. Kandemir , A. Sivasubramaniam , M. J. Irwin , E. Geethanjali, Power-aware partitioned cache architectures, Proceedings of the 2001 international symposium on Low power electronics and design, p.64-67, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383095]
|
 |
14
|
M. Kandemir , J. Ramanujam , J. Irwin , N. Vijaykrishnan , I. Kadayif , A. Parikh, Dynamic management of scratch-pad memory space, Proceedings of the 38th conference on Design automation, p.690-695, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379049]
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
| |
19
|
|
 |
20
|
Stefan Steinke , Nils Grunwald , Lars Wehmeyer , Rajeshwari Banakar , M. Balakrishnan , Peter Marwedel, Reducing energy consumption by dynamic copying of instructions onto onchip memory, Proceedings of the 15th international symposium on System Synthesis, October 02-04, 2002, Kyoto, Japan
[doi> 10.1145/581199.581247]
|
 |
21
|
|
| |
22
|
|
| |
23
|
Bertozzi, D.; Poletti, F.; Benini, L., "Performance Analysis of Arbitration Policies for SoC Communication Architectures", Design Automation of Embedded Systems, Special Issue on Covalidation of Embedded Hardware/Software Systems, 2003.
|
| |
24
|
Mirko Loghi , Federico Angiolini , Davide Bertozzi , Luca Benini , Roberto Zafalon, Analyzing On-Chip Communication in a MPSoC Environment, Proceedings of the conference on Design, automation and test in Europe, p.20752, February 16-20, 2004
|
 |
25
|
|
 |
26
|
Poletti Francesco , Paul Marchal , David Atienza , Luca Benini , Francky Catthoor , Jose M. Mendias, An integrated hardware/software approach for run-time scratchpad management, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996634]
|
| |
27
|
Martello, S.; Toth, P., "Knapsack Problems", John Wiley & Sons, Chichester, 1990.
|
| |
28
|
|
 |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
SWARM http://www.g141.com/projects/swarm/
|
| |
34
|
CACTI http://research.compaq.com/wrl/people/jouppi/CACTI.html
|
CITED BY 16
|
|
|
|
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Memory allocation for embedded systems with a compile-time-unknown scratch-pad size, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
Federico Angiolini , Jianjiang Ceng , Rainer Leupers , Federico Ferrari , Cesare Ferri , Luca Benini, An integrated open framework for heterogeneous MPSoC design space exploration, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nghi Nguyen , Angel Dominguez , Rajeev Barua, Scratch-pad memory allocation without compiler support for java applications, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
Angel Dominguez , Nghi Nguyen , Rajeev K. Barua, Recursive function data allocation to scratch-pad memory, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
Tobias Werth , Tobias Flossmann , Michael Klemm , Dominic Schell , Ulrich Weigand , Michael Philippsen, Dynamic code footprint optimization for the IBM Cell Broadband Engine, Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, p.64-72, May 18-18, 2009
|
|
|
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.3
SPECIAL-PURPOSE AND APPLICATION-BASED SYSTEMS
Subjects:
Real-time and embedded systems
Additional Classification:
C.
Computer Systems Organization
C.4
PERFORMANCE OF SYSTEMS
Subjects:
Performance attributes;
Design studies
C.5
COMPUTER SYSTEM IMPLEMENTATION
C.5.3
Microcomputers
F.
Theory of Computation
F.1
COMPUTATION BY ABSTRACT DEVICES
F.1.3
Complexity Measures and Classes
General Terms:
Algorithms,
Design,
Performance
Keywords:
design automation,
dynamic programming,
embedded design,
executable patching,
memory hierarchy,
optimization algorithm,
post-compiler processing,
power saving,
scratchpad memory
|