| Leveraging on-chip networks for data cache migration in chip multiprocessors |
| Full text |
Pdf
(414 KB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: Multicore memory hierarchy design (part 2)
table of contents
Pages 197-207
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Noel Eisley
|
Princeton University, Princeton, NJ, USA
|
|
Li-Shiuan Peh
|
Princeton University, Princeton, NJ, USA
|
|
Li Shang
|
University of Colorado - Boulder, Boulder, CO, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 119, Citation Count: 1
|
|
|
ABSTRACT
Recently, chip multiprocessors (CMPs) have arisen as the de facto design for modern high-performance processors, with increasing core counts. An important property of CMPs is that remote, but on-chip, L2 cache accesses are less costly than off-chip accesses; this is in contrast to earlier chip-to-chip or board-to-board multiprocessors, where an access to a remote node is just as costly if not more so than a main memory access. This motivates on-chip cache migration as a means to retain more data on-chip. However, previously proposed techniques do not scale to high core counts: they do not leverage the on-chip caches of all cores nor have a scalable migration mechanism. In this paper we propose ascalable in-network migration technique which uses hints embedded within the router microarchitecture to steer L2 cache evictions towards free/invalid cache slots in any on-chip core cache, rather than evicting it off-chip. We show that our technique can provide an average of a 19% reduction in the number of off-chip memory accesses over the state-of-the-art, beating the performance of a pseudo-optimal migration technique. This can be done with negligible area overhead and a manageable traffic overhead of 13.4%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Doug Burger , James R. Goodman , Alain Kägi, Memory bandwidth limitations of future microprocessors, Proceedings of the 23rd annual international symposium on Computer architecture, p.78-89, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
3
|
H. Cain et al. Precise and Accurate Processor Simulation. In Proc. of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, pp. 13--22, February, 2006.
|
 |
4
|
|
| |
5
|
J. Chen et al. Hardware-Modulated Parallelism in Chip Multiprocessors. In DASCMP, November, 2005.
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
Lance Hammond , Benedict A. Hubbert , Michael Siu , Manohar K. Prabhu , Michael Chen , Kunle Olukotun, The Stanford Hydra CMP, IEEE Micro, v.20 n.2, p.71-84, March 2000
[doi> 10.1109/40.848474]
|
| |
10
|
|
 |
11
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088154]
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
A. Mendelson et al. CMP Implementation in Systems Based on the Intel Core Duo Processor. In Intel Technology Journal, Vol. 10, No. 2, May, 2006.
|
 |
16
|
|
 |
17
|
|
| |
18
|
S. J. E. Wilton and N. P. Jouppi. An Enhanced Access and Cycle Time Model for on-Chip Caches. DECWestern Research Laboratory, No. 93/5, 1994.
|
 |
19
|
|
 |
20
|
|
| |
21
|
M. Zhang and K. Asanovic. Victim Migration: Dynamically Adapting between Private and Shared CMP Caches. MIT Technical Report MIT-CSAIL-TR-2005-064, MIT-LCS-TR-1006, October, 2005.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
|