| Performance/area efficiency in chip multiprocessors with micro-caches |
| Full text |
Pdf
(458 KB)
|
Source
|
Conference On Computing Frontiers
archive
Proceedings of the 4th international conference on Computing frontiers
table of contents
Ischia, Italy
SESSION: Memory management in parallel systems
table of contents
Pages: 247 - 258
Year of Publication: 2007
ISBN:978-1-59593-683-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 59, Citation Count: 1
|
|
|
ABSTRACT
This paper proposes the use of very small instruction caches, called micro-caches (μ-caches), consisting of tens to hundreds of bytes, at the bottom of the instruction delivery hierarchy in chip-multiprocessors (CMP). Multi-core architectures place a novel emphasis on the performance/area efficiency of processor cores, and we note that traditional instruction cache sizes reflect an emphasis on hit-rate performance rather than efficiency. In brief, ¼-caches reduce the area footprint of individual cores, thus allowing additional cores to fit within a given die area. We use commercial design tools and a commercial processor core to evaluate this tradeoff in the context of high-performance networking, where CMP architectures have had their greatest commercial impact to date. Our results suggest that the use of u-caches can yield a 25% improvement in efficiency relative to traditional hierarchies. In our evaluation, we consider a range of architectural options (cluster organization, non-blocking caches, cache parameters) and justify our conclusions while accounting for the errors inherent in die area estimates.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Xtensa LX Microprocessor -- Data Book, Tensilica, Inc.
|
| |
9
|
Xtensa Instruction Set Simulator-User Guide, Tensilica, Inc.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Johnson Kin , Munish Gupta , William H. Mangione-Smith, The filter cache: an energy efficient memory structure, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.184-193, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
15
|
|
 |
16
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088154]
|
| |
17
|
Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Optimizing Replication, Communication, and Capacity Allocation in
|
| |
18
|
|
| |
19
|
|
| |
20
|
M. Adiletta et al., "The Next Generation of Intel IXP Network Processors", in Intel Tech. Journal, Vol. 6, Iss 3, 2002.
|
| |
21
|
Cisco Systems. Silicon Packet Processor in the CRS-1 Router. http://www.cisco.com/en/US/products/ps5763/index.html
|
| |
22
|
P. Crowley, "Supporting Mixed Real-Time Workloads in Multithreaded Processors with Segmented Instruction Caches.", in Proc. of the HPCA-10 Workshop on Network Processors and Applications, pages 1--13. Madrid, Spain. February, 2004.
|
| |
23
|
|
| |
24
|
|
|