| Achieving predictable performance through better memory controller placement in many-core CMPs |
| Full text |
Pdf
(883 KB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: On-chip interconnection networks
table of contents
Pages 451-461
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Dennis Abts
|
Google Inc, Madison, WI, USA
|
|
Natalie D. Enright Jerger
|
University of Toronto, Toronto, ON, Canada
|
|
John Kim
|
KAIST, Daejeon, South Korea
|
|
Dan Gibson
|
University of Wisconsin - Madison, Madison, WI, USA
|
|
Mikko H. Lipasti
|
University of Wisconsin - Madison, Madison, WI, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 103, Downloads (12 Months): 270, Citation Count: 0
|
|
|
ABSTRACT
In the near term, Moore's law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many cores, and few memory controllers, where to locate the memory controllers in the on-chip interconnection fabric becomes an important and as yet unexplored question. In this paper we show how the location of the memory controllers can reduce contention (hot spots) in the on-chip fabric and lower the variance in reference latency. This in turn provides predictable performance for memory-intensive applications regardless of the processing core on which a thread is scheduled. We explore the design space of on-chip fabrics to find optimal memory controller placement relative to different topologies (i.e. mesh and torus), routing algorithms, and workloads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
H. Cain, K. Lepak, B. Schwarz, and M. H. Lipasti. Precise and accurate processor simulation. In Workshop On Computer Architecture Evaluation using Commercial Workloads, 2002.
|
| |
2
|
|
 |
3
|
Vinodh Cuppu , Bruce Jacob , Brian Davis , Trevor Mudge, A performance comparison of contemporary DRAM architectures, Proceedings of the 26th annual international symposium on Computer architecture, p.222-233, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
W. Hung , C. Addo-Quaye , T. Theocharides , Y. Xie , N. Vijaykrishnan , M. J. Irwin, Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture, Proceedings of the IEEE International Conference on Computer Design, p.430-437, October 11-13, 2004
|
| |
12
|
Intel Tera-scale Computing Research Program: Teraflop Research Chip. http://techresearch.intel.com/articles/tera-scale/1449.htm.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
SPEC. SPEC benchmarks. http://www.spec.org.
|
 |
21
|
|
| |
22
|
Tilera Corporation. http://www.tilera.com.
|
 |
23
|
|
| |
24
|
TPC. TPC benchmarks. http://www.tpc.org.
|
| |
25
|
S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International, pages 98--589, Feb. 2007.
|
| |
26
|
David Wentzlaff , Patrick Griffin , Henry Hoffmann , Liewei Bao , Bruce Edwards , Carl Ramey , Matthew Mattina , Chyi-Chang Miao , John F. Brown III , Anant Agarwal, On-Chip Interconnection Architecture of the Tile Processor, IEEE Micro, v.27 n.5, p.15-31, September 2007
[doi> 10.1109/MM.2007.89]
|
|