ACM Home Page
Please provide us with feedback. Feedback
Leveraging on-chip networks for data cache migration in chip multiprocessors
Full text PdfPdf (414 KB)
Source
PACT archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques table of contents
Toronto, Ontario, Canada
SESSION: Multicore memory hierarchy design (part 2) table of contents
Pages 197-207  
Year of Publication: 2008
ISBN:978-1-60558-282-5
Authors
Noel Eisley  Princeton University, Princeton, NJ, USA
Li-Shiuan Peh  Princeton University, Princeton, NJ, USA
Li Shang  University of Colorado - Boulder, Boulder, CO, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 119,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1454115.1454144
What is a DOI?

ABSTRACT

Recently, chip multiprocessors (CMPs) have arisen as the de facto design for modern high-performance processors, with increasing core counts. An important property of CMPs is that remote, but on-chip, L2 cache accesses are less costly than off-chip accesses; this is in contrast to earlier chip-to-chip or board-to-board multiprocessors, where an access to a remote node is just as costly if not more so than a main memory access. This motivates on-chip cache migration as a means to retain more data on-chip. However, previously proposed techniques do not scale to high core counts: they do not leverage the on-chip caches of all cores nor have a scalable migration mechanism. In this paper we propose ascalable in-network migration technique which uses hints embedded within the router microarchitecture to steer L2 cache evictions towards free/invalid cache slots in any on-chip core cache, rather than evicting it off-chip. We show that our technique can provide an average of a 19% reduction in the number of off-chip memory accesses over the state-of-the-art, beating the performance of a pseudo-optimal migration technique. This can be done with negligible area overhead and a manageable traffic overhead of 13.4%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
H. Cain et al. Precise and Accurate Processor Simulation. In Proc. of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, pp. 13--22, February, 2006.
4
 
5
J. Chen et al. Hardware-Modulated Parallelism in Chip Multiprocessors. In DASCMP, November, 2005.
6
 
7
8
 
9
 
10
11
 
12
13
 
14
 
15
A. Mendelson et al. CMP Implementation in Systems Based on the Intel Core Duo Processor. In Intel Technology Journal, Vol. 10, No. 2, May, 2006.
16
17
 
18
S. J. E. Wilton and N. P. Jouppi. An Enhanced Access and Cycle Time Model for on-Chip Caches. DECWestern Research Laboratory, No. 93/5, 1994.
19
20
 
21
M. Zhang and K. Asanovic. Victim Migration: Dynamically Adapting between Private and Shared CMP Caches. MIT Technical Report MIT-CSAIL-TR-2005-064, MIT-LCS-TR-1006, October, 2005.
 
22
 
23
 
24
 
25
 
26


Collaborative Colleagues:
Noel Eisley: colleagues
Li-Shiuan Peh: colleagues
Li Shang: colleagues