|
ABSTRACT
This paper addresses a new cache organization in a Chip Multiprocessors (CMP) environment. We introduce Nahalal, an architecture whose novel floorplan topology partitions cached data according to its usage (shared versus private data), and thus enables fast access to shared data for all processors while preserving the vicinity of private data to each processor. The Nahalal architecture combines the best of both shared caches and private caches, enabling fast accesses to data as in private caches while eliminating the need for inter-cache coherence transactions. Detailed simulations in Simics demonstrate that Nahalal decreases cache access latency by up to 41.1% compared to traditional CMP designs, yielding performance gains of up to 12.65% in run time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ho, K. Mai, and M. Horowitz, ''The future of wires,'' Proceedings of IEEE,89(4), April 2001.
|
| |
2
|
|
 |
3
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
| |
4
|
|
 |
5
|
|
| |
6
|
Gochman, A. Mendelson, A. Naveh, A, and E. Rotem, "Introduction to Intel® Core" Duo Processor Architecture," Intel Technology Journal, Volume 10, Issue 02. May 2006.
|
| |
7
|
AMD white paper, ''Key Architectural Features AMD Athlon™ 64 X2 Dual-Core and AMD Athlon™ X2 Dual-Core Processors,'' http://www.amd.com/gb-uk/Processors/ProductInformation/0,,30_118_9485_13041%5E13043,00.html
|
| |
8
|
AMD technical articles, ''Barcelona's Innovative Architecture Is Driven by a New Shared Cache,'' http://developer.amd.com/article_print.jsp?id=173
|
 |
9
|
|
| |
10
|
|
 |
11
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088154]
|
| |
12
|
|
| |
13
|
|
| |
14
|
Howard, ''Garden Cities of To-Morrow,'' London: Swan Sonnenschein & Co. Ltd, 1902
|
| |
15
|
Tomer Y. Morad , Uri C. Weiser , Avinoam Kolodny , Mateo Valero , Eduard Ayguade, Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors, IEEE Computer Architecture Letters, v.5 n.1, p.4, January 2006
[doi> 10.1109/L-CA.2006.6]
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
Jin and S. Cho, ''Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring'', in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
|
 |
20
|
|
| |
21
|
Liu, A. Sivasubramaniam, M. Kandemir, and M. J. Irwin, ''Enhancing L2 organization for CMPs with a center cell,'' IPDPS'06, April 2006.
|
| |
22
|
Jin, and S. Cho, ''Better than the two: Exceeding private and shared caches via two-dimensional page coloring,'' in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
|
 |
23
|
|
| |
24
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
| |
25
|
Ricci, S. Barrus, D. Gebhardt, and R. Balasubramonian, ''Leveraging Bloom Filters for Smart Search Within NUCA Caches'', 7th Workshop on Complexity-Effective Design (WCED), June 2006.
|
 |
26
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
27
|
Vishal Aslot , Max J. Domeika , Rudolf Eigenmann , Greg Gaertner , Wesley B. Jones , Bodo Parady, SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance, Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, p.1-10, July 30-31, 2001
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
J. Marathe, M. F. Spear, C. Heriot, A. Acharya, D. Eisenstat, W. N. Scherer III, and M. L. Scott, "Lowering the Overhead of Nonblocking Software Transactional Memory," TRANSACT 2006
|
 |
32
|
Rakesh Kumar , Dean M. Tullsen , Parthasarathy Ranganathan , Norman P. Jouppi , Keith I. Farkas, Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, Proceedings of the 31st annual international symposium on Computer architecture, p.64, June 19-23, 2004, München, Germany
|
 |
33
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
|