|
ABSTRACT
Scratchpad memory has been introduced as a replacement for cache memory as it improves the performance of certain embedded systems. Additionally, it has also been demonstrated that scratchpad memory can significantly reduce the energy consumption of the memory hierarchy of embedded systems. This is significant, as the memory hierarchy consumes a substantial proportion of the total energy of an embedded system. This paper deals with optimization of the instruction memory scratchpad based on a novel methodology that uses a metric which we call the concomitance. This metric is used to find basic blocks which are executed frequently and in close proximity in time. Once such blocks are found, they are copied into the scratchpad memory at appropriate times; this is achieved using a special instruction inserted into the code at appropriate places. For a set of benchmarks taken from Mediabench, our scratchpad system consumed just 59% (avg) of the energy of the cache system, and 73% (avg) of the energy of the state of the art scratchpad system, while improving the overall performance. Compared to the state of the art method, the number of instructions copied into the scratchpad memory from the main memory is reduced by 88%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Montanaro et al., "A 160MHz, 32b, 0.5W CMOS RISC microprocessor," JSSC, vol.31 (11), pp. 1703--1712, 1996.
|
 |
2
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
 |
3
|
|
| |
4
|
|
 |
5
|
M. Kandemir , J. Ramanujam , J. Irwin , N. Vijaykrishnan , I. Kadayif , A. Parikh, Dynamic management of scratch-pad memory space, Proceedings of the 38th conference on Design automation, p.690-695, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379049]
|
 |
6
|
|
 |
7
|
Federico Angiolini , Luca Benini , Alberto Caprara, Polynomial-time algorithm for on-chip scratchpad memory partitioning, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951751]
|
 |
8
|
|
| |
9
|
|
 |
10
|
Stefan Steinke , Nils Grunwald , Lars Wehmeyer , Rajeshwari Banakar , M. Balakrishnan , Peter Marwedel, Reducing energy consumption by dynamic copying of instructions onto onchip memory, Proceedings of the 15th international symposium on System Synthesis, October 02-04, 2002, Kyoto, Japan
[doi> 10.1145/581199.581247]
|
| |
11
|
|
 |
12
|
G. Ramalingam, On loops, dominators, and dominance frontier, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.233-241, June 18-21, 2000, Vancouver, British Columbia, Canada
|
 |
13
|
|
 |
14
|
|
| |
15
|
B. Steensgaard, "Sequentializing Program Dependence Graphs for Irreducible Programs," Technical Report MSR_TR-93-14, Microsoft Research, Redmond, Washington, October 1993.
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
P. Panda and et.al., "A Data Alignment Technique for Improving Cache Performance," ICCD, 1997.
|
| |
27
|
|
| |
28
|
S. Bartolini and C. A. Prete, "A cache-aware program transformation technique suitable for embedded systems," Information and Software Technology 44(13), 2002.
|
| |
29
|
D. Burger and T. M. Austin, "The SimpleScalar Tool Set, Version 2.0," TR-CS-1342, University of Wisconsin-madison, June 1997.
|
| |
30
|
J. Edler and M. D. Hill, "Dinero IV Trace-Driven Uniprocessor Cache Simulator," http://www.cs.wisc.edu/markhill/DineroIV/.
|
 |
31
|
|
| |
32
|
P. Shivakumar and N. P. Jouppi, "Cacti 3.0: An Integrated Cache Timing, Power, and Area Model," Technical Report 2001/2, Compaq Computer Corporation, August, 2001. 2001.
|
| |
33
|
IBM Microelectronics Division, "Embedded DRAM SA-27E," http://ibm.com/chips, 2002.
|
| |
34
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
CITED BY 4
|
|
Bernhard Egger , Chihun Kim , Choonki Jang , Yoonsung Nam , Jaejin Lee , Sang Lyul Min, A dynamic code placement technique for scratchpad memory using postpass optimization, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|