ACM Home Page
Please provide us with feedback. Feedback
Performance/area efficiency in chip multiprocessors with micro-caches
Full text PdfPdf (458 KB)
Source
Conference On Computing Frontiers archive
Proceedings of the 4th international conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Memory management in parallel systems table of contents
Pages: 247 - 258  
Year of Publication: 2007
ISBN:978-1-59593-683-7
Authors
Michela Becchi  Washington University, St. Louis, MO
Mark A. Franklin  Washington University, St. Louis, MO
Patrick J. Crowley  Washington University, St. Louis, MO
Sponsors
ACM: Association for Computing Machinery
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 59,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242531.1242567
What is a DOI?

ABSTRACT

This paper proposes the use of very small instruction caches, called micro-caches (μ-caches), consisting of tens to hundreds of bytes, at the bottom of the instruction delivery hierarchy in chip-multiprocessors (CMP). Multi-core architectures place a novel emphasis on the performance/area efficiency of processor cores, and we note that traditional instruction cache sizes reflect an emphasis on hit-rate performance rather than efficiency. In brief, ¼-caches reduce the area footprint of individual cores, thus allowing additional cores to fit within a given die area. We use commercial design tools and a commercial processor core to evaluate this tradeoff in the context of high-performance networking, where CMP architectures have had their greatest commercial impact to date. Our results suggest that the use of u-caches can yield a 25% improvement in efficiency relative to traditional hierarchies. In our evaluation, we consider a range of architectural options (cluster organization, non-blocking caches, cache parameters) and justify our conclusions while accounting for the errors inherent in die area estimates.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
Xtensa LX Microprocessor -- Data Book, Tensilica, Inc.
 
9
Xtensa Instruction Set Simulator-User Guide, Tensilica, Inc.
 
10
 
11
 
12
 
13
 
14
 
15
16
 
17
Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Optimizing Replication, Communication, and Capacity Allocation in
 
18
 
19
 
20
M. Adiletta et al., "The Next Generation of Intel IXP Network Processors", in Intel Tech. Journal, Vol. 6, Iss 3, 2002.
 
21
Cisco Systems. Silicon Packet Processor in the CRS-1 Router. http://www.cisco.com/en/US/products/ps5763/index.html
 
22
P. Crowley, "Supporting Mixed Real-Time Workloads in Multithreaded Processors with Segmented Instruction Caches.", in Proc. of the HPCA-10 Workshop on Network Processors and Applications, pages 1--13. Madrid, Spain. February, 2004.
 
23
 
24


Collaborative Colleagues:
Michela Becchi: colleagues
Mark A. Franklin: colleagues
Patrick J. Crowley: colleagues