| Adaptive set pinning: managing shared caches in chip multiprocessors |
| Full text |
Flv
(22:00),
Mp3
(9.26 MB),
Pdf
(562 KB)
|
Source
|
Architectural Support for Programming Languages and Operating Systems
archive
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
table of contents
Seattle, WA, USA
SESSION: Microarchitecture
table of contents
Pages 135-144
Year of Publication: 2008
ISBN:978-1-59593-958-6
Also published in ...
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 28, Downloads (12 Months): 214, Citation Count: 4
|
|
ABSTRACT
As part of the trend towards Chip Multiprocessors (CMPs) for the next leap in computing performance, many architectures have explored sharing the last level of cache among different processors for better performance-cost ratio and improved resource allocation. Shared cache management is a crucial CMP design aspect for the performance of the system. This paper first presents a new classification of cache misses - CII: Compulsory, Inter-processor and Intra-processor misses - for CMPs with shared caches to provide a better understanding of the interactions between memory transactions of different processors at the level of shared cache in a CMP. We then propose a novel approach, called set pinning, for eliminating inter-processor misses and reducing intra-processor misses in a shared cache. Furthermore, we show that an adaptive set pinning scheme improves over the benefits obtained by the set pinning scheme by significantly reducing the number of off-chip accesses. Extensive analysis of these approaches with SPEComp 2001 benchmarks is performed using a full system simulator. Our experiments indicate that the set pinning scheme achieves an average improvement of 22.18% in the L2 miss rate while the adaptive set pinning scheme reduces the miss rates by an average of 47.94% as compared to the traditional shared cache scheme. They also improve the performance by 7.24% and 17.88% respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Vishal Aslot , Max J. Domeika , Rudolf Eigenmann , Greg Gaertner , Wesley B. Jones , Bodo Parady, SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance, Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, p.1-10, July 30-31, 2001
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
Michel Dubois , Jonas Skeppstedt , Livio Ricciulli , Krishnan Ramamurthy , Per Stenström, The detection and elimination of useless misses in multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.88-97, May 16-19, 1993, San Diego, California, United States
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Jiwei Lu , Abhinav Das , Wei-Chung Hsu , Khoa Nguyen , Santosh G. Abraham, Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, November 12-16, 2005, Barcelona, Spain
[doi> 10.1109/MICRO.2005.18]
|
| |
18
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
| |
19
|
|
 |
20
|
|
| |
21
|
P. Petoumenos, G. Keramidas, H. Zeffer, S. Kaxiras, and E. Hagersten. Modeling cache sharing on chip multiprocessor architectures. In Proc. of the IEEE International Symposium on Workload Characterization, 2006.
|
| |
22
|
|
 |
23
|
|
| |
24
|
X. Shi, Z. Yang, J. Peir, L. Peng, Y.-K. Chen, Lee, and Liang. Coterminous locality and coterminous group data prefetching on chipmultiprocessors. In Proc. of the 20th International Parallel and Distributed Processing Symposium, Rhodes Island, 2006.
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared l2 caches on multicore systems in software. In Proc. of the Workshop on the Interaction between Operating Systems and Computer Architecture, San Diego, 2007.
|
| |
32
|
Nigel Topham , Antonio González , José González, The design and performance of a conflict-avoiding cache, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.71-80, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
33
|
|
| |
34
|
|
 |
35
|
|
 |
36
|
|
CITED BY 4
|
|
Seung Woo Son , Mahmut Kandemir , Mustafa Karakoy , Dhruva Chakrabarti, A compiler-directed data prefetching scheme for chip multiprocessors, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|