|
ABSTRACT
Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design space for a DRAM system organization. Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning request priorities and scheduling requests dynamically, etc. In this design space, we see a wide variation in application execution times: for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged Direct Rambus organization (32 data bits) with 64-byte bursts are 10-20% lower than execution times on an otherwise identical configuration that uses 32-byte bursts. This represents two system configurations that are relatively close to each other in the design space; performance differences become even more pronounced for designs further apart.
This paper characterizes the sources of overhead in high-performance DRAM systems and investigates the most effective ways to reduce a system's exposure to performance loss. In particular, we look at mechanisms to increase a system's support for concurrent transactions, mechanisms to reduce request latency, and mechanisms to reduce the “system overhead”—the portion of the primary memory system's overhead that is not due to DRAM latency but rather to things like turnaround time, request queueing, inefficiencies due to read/write request interleaving, etc. Our simulator models a 2GHz, highly aggressive out-of-order uniprocessor. The interface to the memory system is fully non-blocking, supporting up to 32 outstanding misses at both the level-1 and level-2 caches and split-transaction busses to all DRAM banks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
W.R. Bryg, K. K. Chan, and N. S. Fiduccia. "A high-performance, low-cost multiprocessor bus for workstations and midrange servers." The Hewlett-PuckardJournal, vol. 47, no. 1, February 1996.
|
| |
3
|
D. Burger and T. M. Austin. "The SimpleScalar tool set, version 2.0." Tech. Rep. CS- 1342, University of Wisconsin-Madison, June 1997.
|
| |
4
|
J. Carter , W. Hsieh , L. Stoller , M. Swanson , L. Zhang , E. Brunvand , A. Davis , C.-C. Kuo , R. Kuramkote , M. Parker , L. Schaelicke , T. Tateyama, Impulse: Building a Smarter Memory Controller, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.70, January 09-12, 1999
|
 |
5
|
Vinodh Cuppu , Bruce Jacob , Brian Davis , Trevor Mudge, A performance comparison of contemporary DRAM architectures, Proceedings of the 26th annual international symposium on Computer architecture, p.222-233, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
6
|
|
| |
7
|
B. Davis, T. Mudge, B. Jacob, and V. Cuppu. "DDR2 and low latency variants." In Proc. Memory Wall Workshop at the 26th Annual lnt 'l Symposium on Computer Architecture, Vancouver, Canada, May 2000.
|
| |
8
|
K. Diefendorff. "Sony's emotionally charged chip: Killer floatingpoint 'Emotion Engine' to power PlayStation 2000." Microprocessor Report, vol. 13, no. 5, pp. 1-1 L, April 1999.
|
| |
9
|
B. Dipert. "DRAM redesign: not just plastic surgery." EDN, vol. 1998, no. 14, pp. 20, July 1998.
|
| |
10
|
B. Dipert. "The slammin, jammin, DRAM scramble." EDN, vol. 2000, no. 2, pp. 68-82, January 2000.
|
| |
11
|
ESDRAM. EnhancedSDRAM IMx 16. Enhanced Memory Systems, Inc., http://www.edram.com/products/datasheets/l 6M_esdram0298a.pdf, 1998.
|
| |
12
|
L. Gwennap. "Alpha 21364 to ease memory bottleneck: Compaq will add Direct RDRAM to 21264 core for late 2000 shipments." MicroprocessorReport, vol. 12, no. 14, pp. 12-15, October 1998.
|
| |
13
|
Sung I. Hong , Sally A. McKee , Maximo H. Salinas , Robert H. Klenke , James H. Aylor , Wm. A. Wulf, Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.80, January 09-12, 1999
|
| |
14
|
T. R. Hotchkiss, N. D. Marschke, and K M. McColsky. "A new memory system design for commercial and technical computing products." The Hewlett-Packard Journal, vol. 47, no. L, February 1996.
|
| |
15
|
|
 |
16
|
Sally A. McKee , Assaji Aluwihare , Benjamin H. Clark , Robert H. Klenke , Trevor C. Landon , Christopher W. Oliver , Maximo H. Salinas , Adam E. Szymkowiak , Kenneth L. Wright , Wm. A. Wulf , James H. Aylor, Design and evaluation of dynamic access ordering hardware, Proceedings of the 10th international conference on Supercomputing, p.125-132, May 25-28, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237578.237594]
|
| |
17
|
|
| |
18
|
B. Prince. High Per/brmance Memories. John Wiley and Sons, West Sussex, England, 1999.
|
| |
19
|
S. Przybylski. "MoSys reveals MDRAM architecture." Microprocessor Report, vol. 9, no. 17, pp. 17-20, December 1995.
|
| |
20
|
|
| |
21
|
Rambus. Direct RDRAM 256/288-Mbit Data Sheet. Rambus, http://www.rambas.com/developer/downloads/rdram.256s.0060- 1.1 .book.pd f, 2000.
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
R. Wilson. "MoSys tries synthetic SRAM." EE Times Online, July 15, 1997, July 1997. http://www.eetimes.com/news/98/1017news/tries.html.
|
CITED BY 11
|
|
H. S. Kim , N. Vijaykrishnan , M. Kandemir , E. Brockmeyer , F. Catthoor , M. J. Irwin, Estimating influence of data layout optimizations on SDRAM energy consumption, Proceedings of the 2003 international symposium on Low power electronics and design, August 25-27, 2003, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongzhong Zheng , Jiang Lin , Zhao Zhang , Eugene Gorbatov , Howard David , Zhichun Zhu, Mini-rank: Adaptive DRAM architecture for improving memory power efficiency, Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, p.210-221, November 08-12, 2008
|
|
|
|
|
|
|
|