|
ABSTRACT
The ever increasing sizes of on-chip caches and the growing domination of wire delay necessitate significant changes to cache hierarchy design methodologies. Many recent proposals advocate splitting the cache into a large number of banks and employing a network-on-chip (NoC) to allow fast access to nearby banks (referred to as Non-Uniform Cache Architectures--NUCA). Most studies on NUCA organizations have assumed a generic NoC and focused on logical policies for cache block placement, movement, and search. Since wire/router delay and power are major limiting factors in modern processors, this work focuses on interconnect design and its influence on NUCA performance and power. We extend the widely-used CACTI cache modeling tool to take network design parameters into account. With these overheads appropriately accounted for, the optimal cache organization is typically very different from that assumed in prior NUCA studies. To alleviate the interconnect delay bottleneck, we propose novel cache access optimizations that introduce heterogeneity within the inter-bank network. The careful consideration of interconnect choices for a large cache results in a 51% performance improvement over a baseline generic NoC and the introduction of heterogeneity within the network yields an additional 11-15% performance improvement.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
H. Bakoglu. Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley, 1990.
|
| |
2
|
|
| |
3
|
K. Banerjee and A. Mehrotra. A Power-optimal Repeater Insertion Methodology for Global Interconnects in Nanometer Designs. IEEE Transactions on Electron Devices, 49(11):2001--2007, November 2002.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
|
 |
8
|
|
| |
9
|
R. Chang, N. Talwalkar, C. Yue, and S. Wong. Near Speed-of-Light Signaling Over On-Chip Electrical Interconnects. IEEE Journal of Solid-State Circuits, 38(5):834--838, May 2003.
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
R. Ho, K. Mai, and M. Horowitz. The Future of Wires. Proceedings of the IEEE, Vol.89, No.4, April 2001.
|
| |
17
|
M. Hrishikesh, D. Burger, S. Keckler, P. Shivakumar, N.P. Jouppi, and K.I. Farkas. The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays.
|
 |
18
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088154]
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
Feihui Li , Chrysostomos Nicopoulos , Thomas Richardson , Yuan Xie , Vijaykrishnan Narayanan , Mahmut Kandemir, Design and Management of 3D Chip Multiprocessors Using Network-in-Memory, Proceedings of the 33rd annual international symposium on Computer Architecture, p.130-141, June 17-21, 2006
|
 |
23
|
Gian Luca Loi , Banit Agrawal , Navin Srivastava , Sheng-Chih Lin , Timothy Sherwood , Kaustav Banerjee, A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy, Proceedings of the 43rd annual conference on Design automation, July 24-28, 2006, San Francisco, CA, USA
[doi> 10.1145/1146909.1147160]
|
 |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
Semiconductor Industry Association. International Technology Roadmap for Semiconductors 2005. http://www.itrs.net/Links/2005ITRS/Home2005.htm.
|
 |
32
|
|
| |
33
|
P. Shivakumar and N.P. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. Technical Report TN-2001/2, Compaq Western Research Laboratory, August 2001.
|
 |
34
|
|
| |
35
|
|
| |
36
|
|
 |
37
|
|
CITED BY
|
Alessandro Bardine , Pierfrancesco Foglia , Giacomo Gabrielli , Cosimo Antonio Prete, Analysis of static and dynamic energy consumption in NUCA caches: initial results, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.105-112, September 16-16, 2007, Brasov, Romania
|
|