|
ABSTRACT
Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening gap between CPU and main memory speeds. Yet, many large applications do not operate well on these systems and are limited by the memory subsystem performance.This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and complexity. Based on a design study using the next generation 0.25µm, 256Mbit dynamic random-access memory (DRAM) process and on the analysis of existing machines, we show that processor memory integration can be used to build competitive, scalable and cost-effective MP systems.We present results from execution driven uni- and multi-processor simulations showing that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor. In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
SPEC Newsletter; URL: http : //www. specbench, org/ results .html
|
| |
4
|
Synopsys Inc., 700 East Middlefield Rd. Mountain View, California, CA 94043.
|
| |
5
|
Horiguchi, M. et.al., An Experimental 220MHz 1Gb DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.252.
|
| |
6
|
Sugibayashi, T. et.al., A 1Gb DRAM for file Applications, IEEE international Solid-State Circuits Conference 1995. San Francisco, p.254.
|
| |
7
|
Miyano, S. et.al., A 1.6GB/s Data-Transfer-Rate 8Mb Embedded DRAM, IEEE International Solid-State Circuits Conference 1995. San Francisco, p.300
|
| |
8
|
MicroSparc documentation, internal communication with Sparc Technology Business inc.
|
| |
9
|
Shimizu, et.al. A Multimedia 32b RISC Microprocessor with 16Mb DRAM, International Solid-State-Circuits Conference, February 1996, pp216-217.
|
| |
10
|
MIPS R4300i Processor Reference Manual, URL: http : / / www.mips.com/r4300i/R4300i B.html
|
 |
11
|
Andreas G. Nowatzyk , Michael C. Browne , Edmund J. Kelly , Michael Parkin, S-connect: from networks of workstations to supercomputer performance, Proceedings of the 22nd annual international symposium on Computer architecture, p.71-82, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
12
|
Nowatzyk, A., Aybay, G., Browne, M., Kelly, E., Parkin, M., Radke, B. and Vishin, S. The S3.mp Scalable Shared Memory Multiprocessor. Proceedings of the 24th International Conference on Parallel Processing, 1995.
|
| |
13
|
MB81164840- CMOS 4x2Mx8 Synchronous DRAM, Fujitsu Microelectronics Inc., 3455 N. first St., San Jose CA 95134,
|
| |
14
|
RDRAM Reference Manual, Rambus Inc., 2465 Latharn Street, Mountain View, CA 94040.
|
| |
15
|
Yoo, J.H. et.al., A 32-bank 1Gb DRAM with 1GB/s Bandwidth, IEEE international Solid-State Circuits Conference 1996, San Francisco, p.378.
|
| |
16
|
Przybylski, S., MoSys Reveals MDRAM Architecture,/Vlicroprocessor Report, Vol 9:17, Dec 25, 1995, MicroDesign Resources, Sebastopol, CA95472. ISSN 0899-9341
|
| |
17
|
Koike, H., et.al., A 30ns 64Mb DRAM with Built-in Self-Test and Repair Function,iSSCC t 992, San Francisco, p 150
|
 |
18
|
|
| |
19
|
Andreas Nowatzyk , Gunes Aybay , Michael C. Browne , Edmund J. Kelly , Michael Parkin , Bill Radke , Sanjay Vishin, Exploiting Parallelism in Cache Coherency Protocol Engines, Proceedings of the First International Euro-Par Conference on Parallel Processing, p.269-286, August 29-31, 1995
|
| |
20
|
|
| |
21
|
|
| |
22
|
Cmelik, B. The SHADE simulator, Sun-Labs Technical Report, 1993
|
 |
23
|
|
 |
24
|
Michel Dubois , Jonas Skeppstedt , Livio Ricciulli , Krishnan Ramamurthy , Per Stenström, The detection and elimination of useless misses in multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.88-97, May 16-19, 1993, San Diego, California, United States
|
 |
25
|
|
| |
26
|
Brorsson, M., Dahlgren, E, Nilsson, H. and Stenstr6m, P. The CacheMire Test Bench - A Flexible and Effective Approach for Simulation of Multiprocessors. Proceedings of the 26th Annual Simulation Symposium, pp. 115-124, 1993,
|
| |
27
|
The Transputer Reference Manual, 1988, INMOS Ltd., Pub. Prentice Hall, ISBN 0-13-929001-X.
|
| |
28
|
Dally, W.J. et. al. M-Machine Microarchitecture, Tech Report, Artificial Intelligence Lab MIT, Cambridge, MA. Jan 1993
|
| |
29
|
Kogge, P.M., EXECUBE- A New Architecture for Scalable MPPs, 1994 international Conference on Parallel Processing.
|
| |
30
|
ADSP-21060 SHAR C Super Harvard Architecture Computer, ANALOG DEVICES, Norwood, MA, Oct. 1993.
|
CITED BY 32
|
|
Taku Ohsawa , Koji Kai , Kazuaki Murakami, Optimizing the DRAM refresh count for merged DRAM/logic LSIs, Proceedings of the 1998 international symposium on Low power electronics and design, p.82-87, August 10-12, 1998, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Taisuke Boku , Ken'ichi Itakura , Hiroshi Nakamura , Kisaburo Nakazawa, CP-PACS: a massively parallel processor for large scale scientific calculations, Proceedings of the 11th international conference on Supercomputing, p.108-115, July 07-11, 1997, Vienna, Austria
|
|
|
Jay B. Brockman , Peter M. Kogge , Thomas L. Sterling , Vincent W. Freeh , Shannon K. Kuntz, Microservers: a new memory semantics for massively parallel computing, Proceedings of the 13th international conference on Supercomputing, p.454-463, June 20-25, 1999, Rhodes, Greece
|
|
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, ACM SIGARCH Computer Architecture News, v.28 n.2, p.282-293, May 2000
|
|
|
Richard Fromm , Stylianos Perissakis , Neal Cardwell , Christoforos Kozyrakis , Bruce McGaughy , David Patterson , Tom Anderson , Katherine Yelick, The energy efficiency of IRAM architectures, ACM SIGARCH Computer Architecture News, v.25 n.2, p.327-337, May 1997
|
|
|
|
|
|
|
|
|
David Patterson , Thomas Anderson , Neal Cardwell , Richard Fromm , Kimberly Keeton , Christoforos Kozyrakis , Randi Thomas , Katherine Yelick, A Case for Intelligent RAM, IEEE Micro, v.17 n.2, p.34-44, March 1997
|
|
|
|
|
|
Mary Hall , Peter Kogge , Jeff Koller , Pedro Diniz , Jacqueline Chame , Jeff Draper , Jeff LaCoss , John Granacki , Jay Brockman , Apoorv Srivastava , William Athas , Vincent Freeh , Jaewook Shin , Joonseok Park, Mapping irregular applications to DIVA, a PIM-based data-intensive architecture, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jay B. Brockman , Shyamkumar Thoziyoor , Shannon K. Kuntz , Peter M. Kogge, A low cost, multithreaded processing-in-memory system, Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, p.16-22, June 20-20, 2004, Munich, Germany
|
|
|
Jaffrey Draper , J. Tim Barrett , Jeff Sondeen , Sumit Mediratta , Chang Woo Kang , Ihn Kim , Gokhan Daglikoca, A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System, Journal of VLSI Signal Processing Systems, v.40 n.1, p.73-84, May 2005
|
|
|
|
|
|
|
|
|
Jeff Draper , Jacqueline Chame , Mary Hall , Craig Steele , Tim Barrett , Jeff LaCoss , John Granacki , Jaewook Shin , Chun Chen , Chang Woo Kang , Ihn Kim , Gokhan Daglikoca, The architecture of the DIVA processing-in-memory chip, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|