ACM Home Page
Please provide us with feedback. Feedback
Zero cost indexing for improved processor cache performance
Full text PdfPdf (418 KB)
Source ACM Transactions on Design Automation of Electronic Systems (TODAES) archive
Volume 11 ,  Issue 1  (January 2006) table of contents
Pages: 3 - 25  
Year of Publication: 2006
ISSN:1084-4309
Author
Tony Givargis  University of California, Irvine, Irvine, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 47,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1124713.1124715
What is a DOI?

ABSTRACT

The increasing use of microprocessor cores in embedded systems as well as mobile and portable devices creates an opportunity for customizing the cache subsystem for improved performance. In traditional cache design, the index portion of the memory address bus consists of the K least significant bits, where K = log2 D and D is the depth of the cache. However, in devices where the application set is known and characterized (e.g., systems that execute a fixed application set) there is an opportunity to improve cache performance by choosing a near-optimal set of bits used as index into the cache. This technique does not add any overhead in terms of area or delay. In this article, we present an efficient heuristic algorithm for selecting K index bits for improved cache performance. We show the feasibility of our algorithm by applying it to a large number of embedded system applications as well as the integer SPEC CPU 2000 benchmarks. Specifically, for data traces, we show up to 45% reduction in cache misses. Likewise, for instruction traces, we show up to 31% reduction in cache misses. When a unified data/instruction cache architecture is considered, our results show an average improvement of 14.5% for the Powerstone benchmarks and an average improvement of 15.2% for the SPEC'00 benchmarks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
4
5
 
6
 
7
 
8
Huang, Q., Xue, J., and Vera, X. 2003. Code tiling for improving the cache performance of pde solvers. In Proceedings of the International Conference on Parallel Processing. ACM, New York, 615--626.
 
9
ITRS. 2005. Technology roadmap for semiconductors. http://www.itrs.com.
 
10
11
 
12
 
13
14
 
15
PowerStone. 1999. The powerstone benchmarks. www.motorola.com.
 
16
 
17
SPEC'00. Spec cpu 2000. http://www.spec.org.
18
 
19
20
21
 
22
Wong, S., Vassiliadis, S., and Cotofana, S. 2004. Future directions of (programmable and reconfigurable) embedded processors. In Domain-Specific Processors: Systems, Architecture, Modeling, and Simulation. Marcel Dekker, Inc., London, UK, 235--257.