ACM Home Page
Please provide us with feedback. Feedback
Reducing cache misses through programmable decoders
Full text PdfPdf (782 KB)
Source
ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 4 ,  Issue 4  (January 2008) table of contents
Article No. 5  
Year of Publication: 2008
ISSN:1544-3566
Author
Chuanjun Zhang  University of Missouri-Kansas City, Kansas City, Missouri
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 168,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1328195.1328200
What is a DOI?

ABSTRACT

Level-one caches normally reside on a processor's critical path, which determines clock frequency. Therefore, fast access to level-one cache is important. Direct-mapped caches exhibit faster access time, but poor hit rates, compared with same sized set-associative caches because of nonuniform accesses to the cache sets. The nonuniform accesses generate more cache misses in some sets, while other sets are underutilized. We propose to increase the decoder length and, hence, reduce the accesses to heavily used sets without dynamically detecting the cache set usage information. We increase the access to the underutilized cache sets by incorporating a replacement policy into the cache design using programmable decoders. On average, the proposed techniques achieve as low a miss rate as a traditional 4-way cache on all 26 SPEC2K benchmarks for the instruction and data caches, respectively. This translates into an average IPC improvement of 21.5 and 42.4% for SPEC2K integer and floating-point benchmarks, respectively. The B-Cache consumes 10.5% more power per access, but exhibits a 12% total memory access-related energy savings as a result of the miss rate reductions, and, hence, the reduction to applications' execution time. Compared with previous techniques that aim at reducing the miss rate of direct-mapped caches, our technique requires only one cycle to access all cache hits and has the same access time of a direct-mapped cache.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
Burger, D. and Austin, T. M. 1997. The SimpleScalar Tool Set, Version 2.0. Univ. of Wisconsin-Madison Computer Sciences Dept. Technical Report #1342, June.
 
6
Cadence Corporation. http://www.cadence.com
 
7
8
9
10
11
12
13
14
15
16
 
17
18
19
 
20
Naffziger, S. D., Colon-Bonet, G., Fischer, T., Riedlinger, R., Sullivan, T. J., and Grutkowski, T. 2002. The implementation of the Itanium 2 microprocessor. IEEE Journal of Solid-State Circuits 37, 11.
21
22
 
23
Reinmann, G. and Jouppi, N. P. 1999. CACTI2.0: An Integrated Cache Timing and Power Model. COMPAQ Western Research Lab.
 
24
Santhanam, S. et al. 1998. A low-cost, 300-MHz, RISC CPU with attached media processor. IEEE Journal of Solid-State Circuits 33, 11, 1829--1839.
25
26
 
27
Standard Performance Evaluation Corporation. http://www.specbench.org/osg/cpu2000/.
 
28
Sun Microsystems, Inc. 2006. http://www.sun.com/servers/entry/v210/datasheet.pdf.
 
29
Weaver, C. T. Pre-compiled SPEC2000 Alpha Binaries. Available at: http://www.simplescalar.org.
30
31
 
32
Yoshimoto, M. et al. 1983. A divided word-line structure in the static RAM and its application to a 64k full CMOS RAM. IEEE J. Solid-State Circuits SC-21, 479--485.
33
34
35
 
36