ACM Home Page
Please provide us with feedback. Feedback
A novel migration-based NUCA design for chip multiprocessors
Full text PdfPdf (531 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2008 ACM/IEEE conference on Supercomputing table of contents
Austin, Texas
SECTION: Papers table of contents
Article No. 28  
Year of Publication: 2008
ISBN:978-1-4244-2835-9
Authors
Mahmut Kandemir  Pennsylvania State University
Feihui Li  NVIDIA
Mary Jane Irwin  Pennsylvania State University
Seung Woo Son  Pennsylvania State University
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 169,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1413370.1413399
What is a DOI?

ABSTRACT

Chip Multiprocessors (CMFs) and Non-Uniform Cache Architectures (NUCAs) represent two emerging trends in computer architecture. Targeting future CMP based systems with NUCA type L2 caches, this paper proposes a novel data migration algorithm for parallel applications and evaluates it. The goal of this migration scheme is to determine a suitable location for each data block within a large L2 space at any given point during execution. A unique characteristic of the proposed scheme is that it models the problem of optimal data placement in the L2 cache space as a two-dimensional post office placement problem, presents a practical architectural implementation of this model, and gives a detailed evaluation of the proposed implementation. In our experimental evaluation, we also compare our approach to a previously-proposed NUCA management scheme using applications from the specomp suite, oltp, specjbb, and specweb. These experiments show that our migration approach generates about 35% improvement, on average, in average L2 access latency over the previous migration scheme, and these L2 latency savings translate, on average, to 9.5% improvement in IPC (instructions per cycle). We also observed during our experiments that both the careful initial placement of data (which itself triggers migrations within the L2 space) and subsequent migrations (due to inter-processor data sharing) play an important role in achieving our performance improvements.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
AMD Athlon 64X2 Dual-Core Processor for Desktop. http://www.amd.com/us-en/Processors/ProductInformation/0,30_118_9485_13041,00.html
 
2
 
3
4
 
5
6
 
7
 
8
 
9
 
10
S. Borkar et al. Platform 2015: Intel processor and platform evolution for the next decade. Technology@Intel Magazine, Mar. 2005.
11
 
12
13
 
14
Intel. Quad-Core Intel Xeon Processor 5400 Series. April 2008. http://download.intel.com/design/xeon/datashts/318589.pdf
 
15
Intel Teraflops Machine. http://www.intel.com/idf/.
16
 
17
18
 
19
 
20
21
 
22
C. Liu, A. Sivasubramaniam, M. Kandemir, and M. J. Irwin. Enhancing L2 organization for CMPs with a center cell. In Proc. the International Parallel and Distributed Processing Symposium, 2006.
 
23
 
24
M. Kandemir, F. Li, M. J. Irwin, and S. W. Son. A Novel Migration-Based NUCA Design for Chip Multiprocessor. Technical Report CSE-08-013, The Pennsylvania State University, 2008.
 
25
 
26
27
 
28
29
 
30
31
 
32
 
33
Standard Performance Evaluation Corporation. Specjbb2000 Java business benchmark. http://www.spec.org/osg/jbb2000/, 1998.
 
34
Standard Performance Evaluation Corporation. SPEC OMP. http://www.spec.org/hpg/omp2001/, Dec. 2005.
 
35
Standard Performance Evaluation Corporation. SPECWEB http://www.spec.org/web2005/
36
37
 
38

Collaborative Colleagues:
Mahmut Kandemir: colleagues
Feihui Li: colleagues
Mary Jane Irwin: colleagues
Seung Woo Son: colleagues