|
ABSTRACT
While hardware instruction caches are present in virtually all general-purpose and high-performance microprocessors today, many embedded processors use SRAM or scratchpad memories instead. These are simple array memory structures that are directly addressed and explicitly managed by software. Compared to hardware caches of the same data capacity, they are smaller, have shorter access times and consume less energy per access. Access times are also easier to predict with simple memories since there is no possibility of a "miss." On the other hand, they are more difficult for the programmer to use since they are not automatically managed.In this paper, we present a software system that allows all or part of an SRAM or scratchpad memory to be automatically managed as a cache. This system provides the programming convenience of a cache for processors that lack dedicated caching hardware. It has been implemented for an actual processor and runs on real hardware. Our results show that a software-based instruction cache can be built that provides performance within 10% of a traditional hardware cache on many benchmarks while using a cheaper, simpler, SRAM memory. On these same benchmarks, energy consumption is up to 3% lower than it would be using a hardware cache.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Federico Angiolini , Francesco Menichelli , Alberto Ferrero , Luca Benini , Mauro Olivieri, A post-compiler approach to scratchpad mapping of code, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
[doi> 10.1145/1023833.1023869]
|
 |
2
|
Vasanth Bala , Evelyn Duesterwald , Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.1-12, June 18-21, 2000, Vancouver, British Columbia, Canada
|
 |
3
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
 |
4
|
|
| |
5
|
D. Bruening, E. Duesterwald, and S. Amarasinghe. Design and implementation of a dynamic optimization framework for Windows. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), December 2000.
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
Giuseppe Desoli , Nikolay Mateev , Evelyn Duesterwald , Paolo Faraboschi , Joseph A. Fisher, DELI: a new run-time control point, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
11
|
A. Dominguez, S. Udayakumaran, and R. Barua. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing, 1(4), 2005.
|
 |
12
|
|
| |
13
|
|
| |
14
|
A. E. Eichenberger , J. K. O'Brien , K. M. O'Brien , P. Wu , T. Chen , P. H. Oden , D. A. Prener , J. C. Shepherd , B. So , Z. Sura , A. Wang , T. Zhang , P. Zhao , M. K. Gschwind , R. Archambault , Y. Gao , R. Koo, Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture, IBM Systems Journal, v.45 n.1, p.59-84, January 2006
|
 |
15
|
Dawson R. Engler, VCODE: a retargetable, extensible, very fast dynamic code generation system, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.160-170, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
16
|
Michael Gschwind , H. Peter Hofstee , Brian Flachs , Martin Hopkins , Yukio Watanabe , Takeshi Yamazaki, Synergistic Processing in Cell's Multicore Architecture, IEEE Micro, v.26 n.2, p.10-24, March 2006
[doi> 10.1109/MM.2006.41]
|
| |
17
|
Sudhanva Gurumurthi , Anand Sivasubramaniam , Mary Jane Irwin , N. Vijaykrishnan , Mahmut Kandemir , Tao Li , Lizy Kurian John, Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach, Proceedings of the 8th International Symposium on High-Performance Computer Architecture, p.141, February 02-06, 2002
|
 |
18
|
|
| |
19
|
|
| |
20
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
21
|
|
| |
22
|
|
| |
23
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
24
|
Philip Machanick , Pierre Salverda , Lance Pompe, Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.105-114, October 02-07, 1998, San Jose, California, United States
|
 |
25
|
|
| |
26
|
J. Montanaro, R.T. Witek, K. Anne, A.J. Black, E.M. Cooper, D.W. Dobberpuhl, P.M. Donahue, J. Eno, G.W. Hoeppner, D. Kruckemyer, T.H. Lee, P.C.M. Lin, L. Madden, D. Murray, M.H. Pearce, S. Santhanam, K.J. Snyder, R. Stephany, and S.C. Thierauf. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE JSSC, 31(11):1703--1714, November 1996.
|
| |
27
|
C. Moritz, M. Frank, W. Lee, and S. Amarasinghe. Hot pages: Software caching for Raw microprocessors. Technical Report LCSTM-599, Massachusetts Institute of Technology Lab for Computer Science, 1999.
|
| |
28
|
|
 |
29
|
|
 |
30
|
|
| |
31
|
Rajiv A. Ravindran , Pracheeti D. Nagarkar , Ganesh S. Dasika , Eric D. Marsman , Robert M. Senger , Scott A. Mahlke , Richard B. Brown, Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache, Proceedings of the international symposium on Code generation and optimization, p.179-190, March 20-23, 2005
[doi> 10.1109/CGO.2005.13]
|
| |
32
|
P. Shivakumar and N.P. Jouppi. CACTI 3.0: An integrated cache timing, power and area model. Technical Report 2001/2, Compaq Western Research Laboratory, Dec 2001.
|
 |
33
|
|
| |
34
|
|
| |
35
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
 |
36
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, Proceedings of the 31st annual international symposium on Computer architecture, p.2, June 19-23, 2004, München, Germany
|
| |
37
|
|
| |
38
|
S.J.E. Wilton and N.P. Jouppi. CACTI: An enhanced cache access and cycle time model. IEEE JSSC, 31(5):677--688, May 1996.
|
 |
39
|
|
| |
40
|
|
| |
41
|
M. Zhang and K. Asanovic. Highly associative caches for low-power processors. In Kool Chips Workshop, 33rd International Symposium on Microarchitecture, 2000.
|
CITED BY 6
|
|
Jose Baiocchi , Bruce R. Childers , Jack W. Davidson , Jason D. Hiser , Jonathan Misurda, Fragment cache management for dynamic binary translators in embedded systems with scratchpad, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
Ben Lickly , Isaac Liu , Sungjun Kim , Hiren D. Patel , Stephen A. Edwards , Edward A. Lee, Predictable programming on a precision timed architecture, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
Marc Gonzàlez , Nikola Vujic , Xavier Martorell , Eduard Ayguadé , Alexandre E. Eichenberger , Tong Chen , Zehra Sura , Tao Zhang , Kevin O'Brien , Kathryn O'Brien, Hybrid access-specific software cache techniques for the cell BE architecture, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
Jaejin Lee , Sangmin Seo , Chihun Kim , Junghyun Kim , Posung Chun , Zehra Sura , Jungwon Kim , SangYong Han, COMIC: a coherent shared memory interface for cell be, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
José A. Baiocchi , Bruce R. Childers , Jack W. Davidson , Jason D. Hiser, Reducing pressure in bounded DBT code caches, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
Tobias Werth , Tobias Flossmann , Michael Klemm , Dominic Schell , Ulrich Weigand , Michael Philippsen, Dynamic code footprint optimization for the IBM Cell Broadband Engine, Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, p.64-72, May 18-18, 2009
|
|