| Scaling the bandwidth wall: challenges in and avenues for CMP scaling |
| Full text |
Pdf
(1.07 MB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Potpourri
table of contents
Pages 371-382
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Brian M. Rogers
|
North Carolina State University, Raleigh, NC, USA
|
|
Anil Krishna
|
IBM, Research Triangle Park, NC, USA
|
|
Gordon B. Bell
|
IBM, Research Triangle Park, NC, USA
|
|
Ken Vu
|
IBM, Research Triangle Park, NC, USA
|
|
Xiaowei Jiang
|
North Carolina State University, Raleigh, NC, USA
|
|
Yan Solihin
|
North Carolina State University, Raleigh, NC, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 107, Downloads (12 Months): 233, Citation Count: 0
|
|
|
ABSTRACT
As transistor density continues to grow at an exponential rate in accordance to Moore's law, the goal for many Chip Multi-Processor (CMP) systems is to scale the number of on-chip cores proportionally. Unfortunately, off-chip memory bandwidth capacity is projected to grow slowly compared to the desired growth in the number of cores. This creates a situation in which each core will have a decreasing amount of off-chip bandwidth that it can use to load its data from off-chip memory. The situation in which off-chip bandwidth is becoming a performance and throughput bottleneck is referred to as the bandwidth wall problem. In this study, we seek to answer two questions: (1) to what extent does the bandwidth wall problem restrict future multicore scaling, and (2) to what extent are various bandwidth conservation techniques able to mitigate this problem. To address them, we develop a simple but powerful analytical model to predict the number of on-chip cores that a CMP can support given a limited growth in memory traffic capacity. We find that the bandwidth wall can severely limit core scaling. When starting with a balanced 8-core CMP, in four technology generations the number of cores can only scale to 24, as opposed to 128 cores under proportional scaling, without increasing the memory traffic requirement. We find that various individual bandwidth conservation techniques we evaluate have a wide ranging impact on core scaling, and when combined together, these techniques have the potential to enable super-proportional core scaling for up to 4 technology generations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
A. R. Alameldeen and D. A. Wood. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. In Tech. Rep. 1500, Computer Sciences Department, University of Wisconsin-Madison, 2004.
|
 |
4
|
|
| |
5
|
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec suite: Characterization and architectural implications. Tech. Rep. TR-811-08, Princeton University, 2008.
|
| |
6
|
Bryan Black , Murali Annavaram , Ned Brekelbaum , John DeVale , Lei Jiang , Gabriel H. Loh , Don McCaule , Pat Morrow , Donald W. Nelson , Daniel Pantuso , Paul Reed , Jeff Rupley , Sadasivan Shankar , John Shen , Clair Webb, Die Stacking (3D) Microarchitecture, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.469-479, December 09-13, 2006
[doi> 10.1109/MICRO.2006.18]
|
 |
7
|
Doug Burger , James R. Goodman , Alain Kägi, Memory bandwidth limitations of future microprocessors, Proceedings of the 23rd annual international symposium on Computer architecture, p.78-89, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
A. Hartstein, V. Srinivasan, T. Puzak, and P. Emma. On the Nature of Cache Miss Behavior: Is It p2? In The Journal of Instruction-Level Parallelism, volume 10, 2008.
|
| |
12
|
|
| |
13
|
|
| |
14
|
ITRS. International Technology Roadmap for Semiconductors: 2005 Edition, Assembly and packaging. In http://www.itrs.net/Links/2005ITRS/AP2005.pdf, 2005.
|
| |
15
|
|
 |
16
|
Rakesh Kumar , Dean M. Tullsen , Parthasarathy Ranganathan , Norman P. Jouppi , Keith I. Farkas, Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, Proceedings of the 31st annual international symposium on Computer architecture, p.64, June 19-23, 2004, München, Germany
|
 |
17
|
|
| |
18
|
H. Q. Le , W. J. Starke , J. S. Fields , F. P. O'Connell , D. Q. Nguyen , B. J. Ronchetti , W. M. Sauer , E. M. Schwarz , M. T. Vaden, IBM POWER6 microarchitecture, IBM Journal of Research and Development, v.51 n.6, p.639-662, November 2007
|
| |
19
|
Y. Li, B. Lee, D. Brooks, Z. Hu, and K. Skadron. CMP design space exploration subject to physical constraints. In in 12th Intl. Symp. on High Performance Computer Architecture, 2006.
|
| |
20
|
H. McGhan. Niagara 2 Opens the Floodgates. Microprocessor Report, 2006.
|
| |
21
|
P. Pujara and A. Aggarwal. Increasing the cache efficiency by eliminating noise. High-Performance Computer Architecture, 2006. The Twelfth Intl. Symp. on, pages 145--154, 2006.
|
| |
22
|
|
| |
23
|
|
| |
24
|
Y. Solihin, F. Guo, T. R. Puzak, and P. G. Emma. Practical Cache Performance Modeling for Computer Architects. In Tutorial with HPCA--13, 2007.
|
| |
25
|
|
| |
26
|
Jessica H. Tseng , Hao Yu , Shailabh Nagar , Niteesh Dubey , Hubertus Franke , Pratap Pattnaik , Hiroshi Inoue , Toshio Nakatani, Performance Studies of Commercial Workloads on a Multi-core System, Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization, p.57-65, September 27-29, 2007
[doi> 10.1109/IISWC.2007.4362181]
|
| |
27
|
|
|