ACM Home Page
Please provide us with feedback. Feedback
Towards hybrid last level caches for chip-multiprocessors
Full text PdfPdf (269 KB)
Source
ACM SIGARCH Computer Architecture News archive
Volume 36 ,  Issue 2  (May 2008) table of contents
SPECIAL ISSUE: DASCMP'07 table of contents
Pages 56-63  
Year of Publication: 2008
ISSN:0163-5964
Authors
Li Zhao  Intel Corporation
Ravi Iyer  Intel Corporation
Mike Upton  Intel Corporation
Don Newell  Intel Corporation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 101,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1399972.1399982
What is a DOI?

ABSTRACT

As CMP platforms are widely adopted, more and more cores are integrated on to the die. To reduce the off-chip memory access, the last level cache is usually organized as a distributed shared cache. In order to avoid hot-spots, cache lines are interleaved across the distributed shared cache slices using a hash function. However, as we increase the number of cores and cache slices in the platform, this also implies that most of data references go to remote cache slices, thereby increasing the access latency significantly. In this paper, we propose a hybrid last level cache, which has some amount of private space and some amount of shared space on each cache slice. For workloads with no sharing, the goal is to provide more hits into the local slice while still keeping the overall miss rate low. For workloads with sufficient sharing, the goal is to allow more sharing in the last-level cache slice. We present hybrid last-level cache design options and study its hit/miss rate behavior for a number of important server applications and multi-programmed workloads. Our simulation results on running multi-programmed workloads based on SPEC CINT2000 as well as multithreaded workloads based on commercial server benchmarks (TPCC, SPECjbb, SAP and TPCE) show that this architecture is advantageous especially since it can improve the local hit rate significantly while keeping the overall miss rate similar to the shared cache.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
 
6
Intel Corporation, "Intel Core 2 Due Processors", http://www.intel.com/products/processor/core2duo/index.htm
 
7
R. Iyer, "On Modeling and Analyzing Cache Hierarchies using CASPER," 11th Conference on Modeling, Analysis and Simulation of Computer and Telecommunication systems (MASCOTS), Oct. 2003
8
9
 
10
 
11
 
12
J. Laudon, "Opportunities and Challenges of the 1000-thread CMP", Workshop on Design, Architecture and Simulation for Chip-Multiprocessors (dasCMP), Dec. 2006
 
13
 
14
 
15
Sap America Inc., "SAP Standard Benchmarks," http://www.sap.com/solutions/benchmark/index.epx
 
16
SPECint, http://www.spec.org/cpu2000/SPECint
 
17
SPECjbb2005, http://www.spec.org/jbb2005/
18
 
19
 
20
 
21
 
22
The TPC-C Benchmark, http://www.tpc.org/tpcc/
 
23
The TPC-E Benchmark, http://www.tpc.org/tpce/
24
25


Collaborative Colleagues:
Li Zhao: colleagues
Ravi Iyer: colleagues
Mike Upton: colleagues
Don Newell: colleagues