ACM Home Page
Please provide us with feedback. Feedback
Missing the memory wall: the case for processor/memory integration
Full text PdfPdf (1.45 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 23rd annual international symposium on Computer architecture table of contents
Philadelphia, Pennsylvania, United States
Pages: 90 - 101  
Year of Publication: 1996
ISBN:0-89791-786-3
Also published in ...
Authors
Ashley Saulsbury  Swedish Institute of Computer Science
Fong Pong  Sun Microsystems Computer Corporation
Andreas Nowatzyk  Sun Microsystems Computer Corporation
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 95,   Citation Count: 32
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/232973.232984
What is a DOI?

ABSTRACT

Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening gap between CPU and main memory speeds. Yet, many large applications do not operate well on these systems and are limited by the memory subsystem performance.This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and complexity. Based on a design study using the next generation 0.25µm, 256Mbit dynamic random-access memory (DRAM) process and on the analysis of existing machines, we show that processor memory integration can be used to build competitive, scalable and cost-effective MP systems.We present results from execution driven uni- and multi-processor simulations showing that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor. In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
SPEC Newsletter; URL: http : //www. specbench, org/ results .html
 
4
Synopsys Inc., 700 East Middlefield Rd. Mountain View, California, CA 94043.
 
5
Horiguchi, M. et.al., An Experimental 220MHz 1Gb DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.252.
 
6
Sugibayashi, T. et.al., A 1Gb DRAM for file Applications, IEEE international Solid-State Circuits Conference 1995. San Francisco, p.254.
 
7
Miyano, S. et.al., A 1.6GB/s Data-Transfer-Rate 8Mb Embedded DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.300
 
8
MicroSparc documentation, internal communication with Sparc Technology Business inc.
 
9
Shimizu, et.al. A Multimedia 32b RISC Microprocessor with 16Mb DRAM, International Solid-State-Circuits Conference, February 1996, pp216-217.
 
10
MIPS R4300i Processor Reference Manual, URL: http : / / www.mips.com/r4300i/R4300i B.html
11
 
12
Nowatzyk, A., Aybay, G., Browne, M., Kelly, E., Parkin, M., Radke, B. and Vishin, S. The S3.mp Scalable Shared Memory Multiprocessor. Proceedings of the 24th International Conference on Parallel Processing, 1995.
 
13
MB81164840- CMOS 4x2Mx8 Synchronous DRAM, Fujitsu Microelectronics Inc., 3455 N. first St., San Jose CA 95134,
 
14
RDRAM Reference Manual, Rambus Inc., 2465 Latharn Street, Mountain View, CA 94040.
 
15
Yoo, J.H. et.al., A 32-bank 1Gb DRAM with 1GB/s Bandwidth, IEEE international Solid-State Circuits Conference 1996, San Francisco, p.378.
 
16
Przybylski, S., MoSys Reveals MDRAM Architecture,/Vlicroprocessor Report, Vol 9:17, Dec 25, 1995, MicroDesign Resources, Sebastopol, CA95472. ISSN 0899-9341
 
17
Koike, H., et.al., A 30ns 64Mb DRAM with Built-in Self-Test and Repair Function,iSSCC t 992, San Francisco, p 150
18
 
19
 
20
 
21
 
22
Cmelik, B. The SHADE simulator, Sun-Labs Technical Report, 1993
23
24
25
 
26
Brorsson, M., Dahlgren, E, Nilsson, H. and Stenstr6m, P. The CacheMire Test Bench - A Flexible and Effective Approach for Simulation of Multiprocessors. Proceedings of the 26th Annual Simulation Symposium, pp. 115-124, 1993,
 
27
The Transputer Reference Manual, 1988, INMOS Ltd., Pub. Prentice Hall, ISBN 0-13-929001-X.
 
28
Dally, W.J. et. al. M-Machine Microarchitecture, Tech Report, Artificial Intelligence Lab MIT, Cambridge, MA. Jan 1993
 
29
Kogge, P.M., EXECUBE- A New Architecture for Scalable MPPs, 1994 international Conference on Parallel Processing.
 
30
ADSP-21060 SHAR C Super Harvard Architecture Computer, ANALOG DEVICES, Norwood, MA, Oct. 1993.

CITED BY  32

Collaborative Colleagues:
Ashley Saulsbury: colleagues
Fong Pong: colleagues
Andreas Nowatzyk: colleagues