|
ABSTRACT
Chip Multiprocessors (CMFs) and Non-Uniform Cache Architectures (NUCAs) represent two emerging trends in computer architecture. Targeting future CMP based systems with NUCA type L2 caches, this paper proposes a novel data migration algorithm for parallel applications and evaluates it. The goal of this migration scheme is to determine a suitable location for each data block within a large L2 space at any given point during execution. A unique characteristic of the proposed scheme is that it models the problem of optimal data placement in the L2 cache space as a two-dimensional post office placement problem, presents a practical architectural implementation of this model, and gives a detailed evaluation of the proposed implementation. In our experimental evaluation, we also compare our approach to a previously-proposed NUCA management scheme using applications from the specomp suite, oltp, specjbb, and specweb. These experiments show that our migration approach generates about 35% improvement, on average, in average L2 access latency over the previous migration scheme, and these L2 latency savings translate, on average, to 9.5% improvement in IPC (instructions per cycle). We also observed during our experiments that both the careful initial placement of data (which itself triggers migrations within the L2 space) and subsequent migrations (due to inter-processor data sharing) play an important role in achieving our performance improvements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AMD Athlon 64X2 Dual-Core Processor for Desktop. http://www.amd.com/us-en/Processors/ProductInformation/0,30_118_9485_13041,00.html
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
S. Borkar et al. Platform 2015: Intel processor and platform evolution for the next decade. Technology@Intel Magazine, Mar. 2005.
|
 |
11
|
Mark Horowitz , Margaret Martonosi , Todd C. Mowry , Michael D. Smith, Informing memory operations: providing memory performance feedback in modern processors, Proceedings of the 23rd annual international symposium on Computer architecture, p.260-270, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
12
|
|
 |
13
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088154]
|
| |
14
|
Intel. Quad-Core Intel Xeon Processor 5400 Series. April 2008. http://download.intel.com/design/xeon/datashts/318589.pdf
|
| |
15
|
Intel Teraflops Machine. http://www.intel.com/idf/.
|
 |
16
|
|
| |
17
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
Feihui Li , Chrysostomos Nicopoulos , Thomas Richardson , Yuan Xie , Vijaykrishnan Narayanan , Mahmut Kandemir, Design and Management of 3D Chip Multiprocessors Using Network-in-Memory, Proceedings of the 33rd annual international symposium on Computer Architecture, p.130-141, June 17-21, 2006
|
| |
22
|
C. Liu, A. Sivasubramaniam, M. Kandemir, and M. J. Irwin. Enhancing L2 organization for CMPs with a center cell. In Proc. the International Parallel and Distributed Processing Symposium, 2006.
|
| |
23
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
| |
24
|
M. Kandemir, F. Li, M. J. Irwin, and S. W. Son. A Novel Migration-Based NUCA Design for Chip Multiprocessor. Technical Report CSE-08-013, The Pennsylvania State University, 2008.
|
| |
25
|
|
| |
26
|
|
 |
27
|
Kunle Olukotun , Basem A. Nayfeh , Lance Hammond , Ken Wilson , Kunyung Chang, The case for a single-chip multiprocessor, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.2-11, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
|
| |
33
|
Standard Performance Evaluation Corporation. Specjbb2000 Java business benchmark. http://www.spec.org/osg/jbb2000/, 1998.
|
| |
34
|
Standard Performance Evaluation Corporation. SPEC OMP. http://www.spec.org/hpg/omp2001/, Dec. 2005.
|
| |
35
|
Standard Performance Evaluation Corporation. SPECWEB http://www.spec.org/web2005/
|
 |
36
|
|
 |
37
|
|
| |
38
|
|
|