ACM Home Page
Please provide us with feedback. Feedback
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems
Full text PdfPdf (627 KB)
Source
International Conference on Hardware Software Codesign archive
Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis table of contents
Grenoble, France
SESSION: Efficient techniques for architecture simulation table of contents
Pages 295-304  
Year of Publication: 2009
ISBN:978-1-60558-628-1
Authors
Mohammad Shihabul Haque  University of New South Wales, Sydney, Australia
Andhi Janapsatya  University of New South Wales, Sydney, Australia
Sri Parameswaran  University of New South Wales, Sydney, Australia
Sponsors
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
SIGDA: ACM Special Interest Group on Design Automation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 17,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629435.1629476
What is a DOI?

ABSTRACT

Simulation of an application is a popular and reliable approach to find the optimal configuration of level one cache memory for an application specific embedded system processor. However, long simulation time is one of the main disadvantages of simulation based approaches. In this paper, we propose a new and fast simulation method, Super Set Simulator (SuSeSim). While previous methods use Top-Down searching strategy, SuSeSim utilizes a Bottom-Up search strategy along with a new elaborate data structure to reduce the search space to determine a cache hit or miss. SuSeSim can simulate hundreds of cache configurations simultaneously by reading an application's memory request trace just once. Total number of cache hits and misses are accurately recorded. Depending on different cache block sizes and benchmark applications, SuSeSim can reduce the number of tags to be checked by up to 43% compared to the existing fastest simulation approach (the CRCB algorithm). With the help of a faster search and an easy to maintain data structure, SuSeSim can be up to 94% faster in simulating memory requests compared to the CRCB algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Xtensa processor. http://www.tensilica.com/.
 
2
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83--94, 2000.
 
3
D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. SIGARCH Comput. Archit. News, 25(3):13--25, 1997.
 
4
J. Edler and M. D. Hill. Dinero iv trace-driven uniprocessor cache simulator. http://www.cs.wisc.edu/ markhill/DineroIV/, 2004.
 
5
W. Fornaciari, D. Sciuto, C. Silvano, and V. Zaccaria. A design framework to efficiently explore energy-delay tradeoffs. In CODES '01: Proceedings of the ninth international symposium on Hardware/software codesign, pages 260--265, New York, NY, USA, 2001. ACM.
 
6
J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBN System Journal, 9(2):78--117, 1970.
 
7
S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21:703--746, 1999.
 
8
M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Trans. Comput., 38(12):1612--1630, 1989.
 
9
K. Horiuchi, S. Kohara, N. Togawa, M. Yanagisawa, and T. Ohtsuki. A data cache optimization system for application processor cores and its experimental evaluation. In IEICE Technical Report, VLD2006-122, ICD2006-213, pages 19--24, 2006.
 
10
A. Janapsatya, A. Ignjatovi´c, and S. Parameswaran. Finding optimal l1 cache configuration for embedded systems. In ASP-DAC '06: Proceedings of the 2006 conference on Asia South Pacific design automation, pages 796--801, Piscataway, NJ, USA, 2006. IEEE Press.
 
11
C. Lee,M. Potkonjak, andW.H.Mangione-smith.Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In In International Symposium on Microarchitecture, pages 330--335, 1997.
 
12
S. Leibson and J.Massingham. Flix: Fast relief for performance-hungry embedded applications. Technical report, Tensilica Inc., 2005.
 
13
X. Li, H. S. Negi, T. Mitra, and A. Roychoudhury. Design space exploration of caches using compressed traces. In ICS '04: Proceedings of the 18th annual international conference on Supercomputing, pages 116--125, New York, NY, USA, 2004. ACM.
 
14
J. J. Pieper, A. Mellan, J. M. Paul, D. E. Thomas, and F. Karim. High level cache simulation for heterogeneous multiprocessors. In DAC '04: Proceedings of the 41st annual conference on Design automation, pages 287--292, New York, NY, USA, 2004. ACM.
 
15
D. Ponomarev, G. Kucuk, and K. Ghose. Accupower: An accurate power estimation tool for superscalar microprocessors. In DATE '02: Proceedings of the conference on Design, automation and test in Europe, page 124, Washington, DC, USA, 2002. IEEE Computer Society.
 
16
R. A. Sugumar and S. G. Abraham. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst., 13(1):32--56, 1995.
 
17
N. Tojo, N. Togawa, M. Yanagisawa, and T. Ohtsuki. Exact and fast l1 cache simulation for embedded systems. In ASP-DAC '09: Proceedings of the 2009 Conference on Asia and South Pacific Design Automation, pages 817--822, Piscataway, NJ, USA, 2009. IEEE Press.
 
18
X. Vera, N. Bermudo, J. Llosa, and A. Gonzalez. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Trans. Prog. Lang. Syst., 26(2):263--300, 2004.