|
ABSTRACT
As CMP platforms are widely adopted, more and more cores are integrated on to the die. To reduce the off-chip memory access, the last level cache is usually organized as a distributed shared cache. In order to avoid hot-spots, cache lines are interleaved across the distributed shared cache slices using a hash function. However, as we increase the number of cores and cache slices in the platform, this also implies that most of data references go to remote cache slices, thereby increasing the access latency significantly. In this paper, we propose a hybrid last level cache, which has some amount of private space and some amount of shared space on each cache slice. For workloads with no sharing, the goal is to provide more hits into the local slice while still keeping the overall miss rate low. For workloads with sufficient sharing, the goal is to allow more sharing in the last-level cache slice. We present hybrid last-level cache design options and study its hit/miss rate behavior for a number of important server applications and multi-programmed workloads. Our simulation results on running multi-programmed workloads based on SPEC CINT2000 as well as multithreaded workloads based on commercial server benchmarks (TPCC, SPECjbb, SAP and TPCE) show that this architecture is advantageous especially since it can improve the local hit rate significantly while keeping the overall miss rate similar to the shared cache.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
Intel Corporation, "Intel Core 2 Due Processors", http://www.intel.com/products/processor/core2duo/index.htm
|
| |
7
|
R. Iyer, "On Modeling and Analyzing Cache Hierarchies using CASPER," 11th Conference on Modeling, Analysis and Simulation of Computer and Telecommunication systems (MASCOTS), Oct. 2003
|
 |
8
|
|
 |
9
|
Ravi Iyer , Li Zhao , Fei Guo , Ramesh Illikkal , Srihari Makineni , Don Newell , Yan Solihin , Lisa Hsu , Steve Reinhardt, QoS policies and architecture for cache/memory in CMP platforms, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
| |
10
|
|
| |
11
|
|
| |
12
|
J. Laudon, "Opportunities and Challenges of the 1000-thread CMP", Workshop on Design, Architecture and Simulation for Chip-Multiprocessors (dasCMP), Dec. 2006
|
| |
13
|
|
| |
14
|
|
| |
15
|
Sap America Inc., "SAP Standard Benchmarks," http://www.sap.com/solutions/benchmark/index.epx
|
| |
16
|
SPECint, http://www.spec.org/cpu2000/SPECint
|
| |
17
|
SPECjbb2005, http://www.spec.org/jbb2005/
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
The TPC-C Benchmark, http://www.tpc.org/tpcc/
|
| |
23
|
The TPC-E Benchmark, http://www.tpc.org/tpce/
|
 |
24
|
|
 |
25
|
|
|