ACM Home Page
Please provide us with feedback. Feedback
Interconnect design considerations for large NUCA caches
Full text PdfPdf (326 KB)
Source
International Symposium on Computer Architecture archive
Proceedings of the 34th annual international symposium on Computer architecture table of contents
San Diego, California, USA
SESSION: Memory and caches table of contents
Pages: 369 - 380  
Year of Publication: 2007
ISBN:978-1-59593-706-3
Also published in ...
Authors
Naveen Muralimanohar  Unversity of Utah, Salt Lake City, UT
Rajeev Balasubramonian  University of Utah, Salt Lake City, UT
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS : Computer Society
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 182,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1250662.1250708
What is a DOI?

ABSTRACT

The ever increasing sizes of on-chip caches and the growing domination of wire delay necessitate significant changes to cache hierarchy design methodologies. Many recent proposals advocate splitting the cache into a large number of banks and employing a network-on-chip (NoC) to allow fast access to nearby banks (referred to as Non-Uniform Cache Architectures--NUCA). Most studies on NUCA organizations have assumed a generic NoC and focused on logical policies for cache block placement, movement, and search. Since wire/router delay and power are major limiting factors in modern processors, this work focuses on interconnect design and its influence on NUCA performance and power. We extend the widely-used CACTI cache modeling tool to take network design parameters into account. With these overheads appropriately accounted for, the optimal cache organization is typically very different from that assumed in prior NUCA studies. To alleviate the interconnect delay bottleneck, we propose novel cache access optimizations that introduce heterogeneity within the inter-bank network. The careful consideration of interconnect choices for a large cache results in a 51% performance improvement over a baseline generic NoC and the introduction of heterogeneity within the network yields an additional 11-15% performance improvement.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
H. Bakoglu. Circuits, Interconnections, and Packaging for VLSI. Addison-Wesley, 1990.
 
2
 
3
K. Banerjee and A. Mehrotra. A Power-optimal Repeater Insertion Methodology for Global Interconnects in Nanometer Designs. IEEE Transactions on Electron Devices, 49(11):2001--2007, November 2002.
 
4
 
5
 
6
 
7
D. Burger and T. Austin. The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
8
 
9
R. Chang, N. Talwalkar, C. Yue, and S. Wong. Near Speed-of-Light Signaling Over On-Chip Electrical Interconnects. IEEE Journal of Solid-State Circuits, 38(5):834--838, May 2003.
10
 
11
12
 
13
 
14
 
15
 
16
R. Ho, K. Mai, and M. Horowitz. The Future of Wires. Proceedings of the IEEE, Vol.89, No.4, April 2001.
 
17
M. Hrishikesh, D. Burger, S. Keckler, P. Shivakumar, N.P. Jouppi, and K.I. Farkas. The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays.
18
19
20
21
22
23
24
 
25
 
26
 
27
28
 
29
 
30
 
31
Semiconductor Industry Association. International Technology Roadmap for Semiconductors 2005. http://www.itrs.net/Links/2005ITRS/Home2005.htm.
32
 
33
P. Shivakumar and N.P. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. Technical Report TN-2001/2, Compaq Western Research Laboratory, August 2001.
34
 
35
 
36
37


Collaborative Colleagues:
Naveen Muralimanohar: colleagues
Rajeev Balasubramonian: colleagues