ACM Home Page
Please provide us with feedback. Feedback
SP-NUCA: a cost effective dynamic non-uniform cache architecture
Full text PdfPdf (426 KB)
Source
ACM SIGARCH Computer Architecture News archive
Volume 36 ,  Issue 2  (May 2008) table of contents
Pages 64-71  
Year of Publication: 2008
ISSN:0163-5964
Authors
Javier Merino  Universidad de Cantabria, Spain
Valentín Puente  Universidad de Cantabria, Spain
Pablo Prieto  Universidad de Cantabria, Spain
José Ángel Gregorio  Universidad de Cantabria, Spain
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 115,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1399972.1399973
What is a DOI?

ABSTRACT

This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces a feasible way to allocate cache blocks according to the access pattern. Each L2 bank is dynamically partitioned at set level in private and shared content. Simply by adjusting the replacement algorithm, we can place private data closer to its owner processor. In contrast, independently of the accessing processor, shared data is always placed in the same position. This approach is capable of reducing on-chip latency without significantly sacrificing hit rates or increasing implementation cost of a conventional static NUCA. Additionally, most of the unnecessary interference between cores in private accesses is removed.

To support the architectural decisions adopted and provide a comparative study, a comprehensive evaluation framework is employed. The workbench is composed of a full system simulator, and a representative set of multithreaded and multiprogrammed workloads. With this infrastructure, different alternatives for the coherence protocol, replacement policies, and cache utilization are analyzed to find the optimal proposal. We conclude that the cost for a feasible implementation should be closer to a conventional static NUCA, and significantly less than a dynamic NUCA.

Finally, a comparison with static and dynamic NUCA is presented. The simulation results suggest that on average the mechanism proposed could improve system performance of a static NUCA and idealized dynamic NUCA by 16% and 6% respectively.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
 
5
 
6
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, S. W. Keckler, "A NUCA Substrate for Flexible CMP Cache Sharing", IEEE Trans. Parallel Distrib. Syst, vol.18, no.8, pp: 1028--1040, September 2007.
7
 
8
I. T. R. for Semiconductors. ITRS 2005 Update. Semiconductor Industry Association, 2005.
 
9
H. Jin, M. Frumkin, J. Yan; "The OpenMP Implementation of NAS Parallel Benchmarks and its Performance", NAS Technical Report NAS-99-011, NASA Ames Research Center, Moffett Field, CA, 1999.
10
 
11
 
12
 
13
 
14
15
16
17
 
18
19
 
20
SPEC2000, http://www.spec.org/cpu2000/
 
21
 
22
G. Suh, S. Devadas, and L. Rudolph. "Dynamic cache partitioning for simultaneous multithreading systems". IASTED Int. Conf. on Parallel and Distributed Computing Systems, 2001
 
23
 
24
S. Thoziyoor, N. Muralimanohar, and N. P. Jouppi. CACTI 5.0: An Integrated Cache Timing, Power, and AreaModel. Technical report, HP Laboratories Palo Alto, 2007.
25
 
26
L. Zhao, R. Iyer, M. Upton, D. Newell, "Towards Hybrid Last Level Caches for Chip-Multiprocessors", dasCMP 2007.


Collaborative Colleagues:
Javier Merino: colleagues
Valentín Puente: colleagues
Pablo Prieto: colleagues
José Ángel Gregorio: colleagues