| Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices |
| Full text |
Pdf
(617 KB)
|
Source
|
ACM SIGARCH Computer Architecture News
archive
Volume 37 , Issue 3 (June 2009)
table of contents
SESSION: DRAM and SSD
table of contents
Pages 255-266
Year of Publication: 2009
ISSN:0163-5964
Also published in ...
|
|
Authors
|
|
Hongzhong Zheng
|
University of Illinois at Chicago, Chicago, IL, USA
|
|
Jiang Lin
|
IBM Corp., Austin, TX, USA
|
|
Zhao Zhang
|
Iowa State University, Ames, IA, USA
|
|
Zhichun Zhu
|
University of Illinois at Chicago, Chicago, IL, USA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 106, Downloads (12 Months): 239, Citation Count: 0
|
|
|
ABSTRACT
The widespread use of multicore processors has dramatically increased the demands on high bandwidth and large capacity from memory systems. In a conventional DDR2/DDR3 DRAM memory system, the memory bus and DRAM devices run at the same data rate. To improve memory bandwidth, we propose a new memory system design called decoupled DIMM that allows the memory bus to operate at a data rate much higher than that of the DRAM devices. In the design, a synchronization buffer is added to relay data between the slow DRAM devices and the fast memory bus; and memory access scheduling is revised to avoid access conflicts on memory ranks. The design not only improves memory bandwidth beyond what can be supported by current memory devices, but also improves reliability, power efficiency, and cost effectiveness by using relatively slow memory devices. The idea of decoupling, precisely the decoupling of bandwidth match between memory bus and a single rank of devices, can also be applied to other types of memory systems including FB-DIMM. Our experimental results show that a decoupled DIMM system of 2667MT/s bus data rate and 1333MT/s device data rate improves the performance of memory-intensive workloads by 51% on average over a conventional memory system of 1333MT/s data rate. Alternatively, a decoupled DIMM system of 1600MT/s bus data rate and 800MT/s device data rate incurs only 8% performance loss when compared with a conventional system of 1600MT/s data rate, with 16% reduction on the memory power consumption and 9% saving on memory energy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Nathan L. Binkert , Ronald G. Dreslinski , Lisa R. Hsu , Kevin T. Lim , Ali G. Saidi , Steven K. Reinhardt, The M5 Simulator: Modeling Networked Systems, IEEE Micro, v.26 n.4, p.52-60, July 2006
[doi> 10.1109/MM.2006.82]
|
 |
2
|
Doug Burger , James R. Goodman , Alain Kägi, Memory bandwidth limitations of future microprocessors, Proceedings of the 23rd annual international symposium on Computer architecture, p.78-89, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
3
|
Vinodh Cuppu , Bruce Jacob, Concurrency, latency, or system overhead: which has the largest impact on uniprocessor DRAM-system performance?, Proceedings of the 28th annual international symposium on Computer architecture, p.62-71, June 30-July 04, 2001, Göteborg, Sweden
|
 |
4
|
Vinodh Cuppu , Bruce Jacob , Brian Davis , Trevor Mudge, A performance comparison of contemporary DRAM architectures, Proceedings of the 26th annual international symposium on Computer architecture, p.222-233, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
5
|
|
 |
6
|
Bruno Diniz , Dorgival Guedes , Wagner Meira, Jr. , Ricardo Bianchini, Limiting the power consumption of main memory, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
I. Hur and C. Lin. A comprehensive approach to DRAM power management. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture, pages 305--316, 2008.
|
 |
11
|
Alvin R. Lebeck , Xiaobo Fan , Heng Zeng , Carla Ellis, Power aware page allocation, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.105-116, November 2000, Cambridge, Massachusetts, United States
|
| |
12
|
|
| |
13
|
K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 164--171, 2001.
|
| |
14
|
|
| |
15
|
MemoryStore.com. Memory module price. http://www.memorystore.com/config/_generic.asp?cboLevel1=71.
|
| |
16
|
MetaRAM, Inc. MetaRAM product brief. http://www.metaram.com/pdf/briefs/MetaRAM_DDR3_PB.pdf.
|
| |
17
|
Micron Technology, Inc. DDR3 SDRAM system-power calculator. http://download.micron.com/downloads/misc/ddr3_power_calc.xls.
|
| |
18
|
Micron Technology, Inc. MT41J128M8BY-187E. http://download.micron.com/pdf/datasheets/dram/ddr3/1Gb%20DDR3%20SDRAM.pdf.
|
| |
19
|
Micron Technology, Inc. HTF18C64-128-256x72D. http://download.micron.com/pdf/datasheets/modules/ddr2/HTF18C64_128_256x72D.pdf, 2007.
|
| |
20
|
Micron Technology, Inc. TN-41-01: Calculating memory system power for DDR3. http://download.micron.com/pdf/technotes/ddr3/TN41_01DDR3%20Power.pdf, 2007.
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
Scott Rixner , William J. Dally , Ujval J. Kapasi , Peter Mattson , John D. Owens, Memory access scheduling, Proceedings of the 27th annual international symposium on Computer architecture, p.128-138, June 2000, Vancouver, British Columbia, Canada
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
J. Vera, F. J. Cazorla, A. Pajuelo, O. J. Santana, E. Fernandez, and M. Valero. A novel evaluation methodology to obtain fair measurements in multithreaded architectures. In Workshop on Modeling Benchmarking and Simulation, 2006.
|
| |
29
|
P. Vogt and J. Haas. Fully-Buffered DIMM technology moves enterprise platforms to the next level. http://www.intel.com/technology/magazine/computing/fully-buffered-dimm-0305.htm, 2005.
|
| |
30
|
|
| |
31
|
F. A. Ware and C. Hampel. Improving power and data efficiency with threaded memory modules. In Proceedings of the 24th International Conference on Computer Design, pages 417--424, 2006.
|
 |
32
|
|
| |
33
|
Hongzhong Zheng , Jiang Lin , Zhao Zhang , Eugene Gorbatov , Howard David , Zhichun Zhu, Mini-rank: Adaptive DRAM architecture for improving memory power efficiency, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.210-221, November 08-12, 2008
[doi> 10.1109/MICRO.2008.4771792]
|
| |
34
|
|
|