|
ABSTRACT
This paper discusses die cost vs. performance tradeoffs for a PIM system that could serve as the memory system of a host processor. For an increase of less than twice the cost of a commodity DRAM part, it is possible to realize a performance speedup of nearly a factor of 4 on irregular applications. This cost efficiency derives from developing a custom multithreaded processor architecture and implementation style that is well-suited for embedding in a memory. Specifically, it takes advantage of the low latency and high row bandwidth to both simplify processor design --- reducing area --- as well as to improve processing throughput. To support our claims of cost and performance, we have used simulation, analysis of existing chips, and also designed and fully implemented a prototype chip, PIM Lite.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
T. Takahashi et. al. A multi-gigabit DRAM technology with 6F2 open-bit-line cell distributed over-driven sensing and stacked-flash fuse. In International Solid-State Circuits Conference (ISSCC), San Francisco, CA, Feb. 2002. IEEE, IEEE.
|
| |
2
|
ARM. ARM thumb family, www.arm.com, 2003.
|
| |
3
|
J. Barnes and P. Hut. A hierarchical O(N logN) force-calculation algorithm. Nature, 324(4):446--449, Dec. 1986.
|
| |
4
|
J. B. Brockman. PIM lite architecture and assembly language manual. Technical report, University of Notre Dame CSE Dept., July 2003.
|
| |
5
|
J. B. Brockman. Programming PIM lite. Technical report, University of Notre Dame CSE Dept., July 2003.
|
 |
6
|
Jay B. Brockman , Peter M. Kogge , Thomas L. Sterling , Vincent W. Freeh , Shannon K. Kuntz, Microservers: a new memory semantics for massively parallel computing, Proceedings of the 13th international conference on Supercomputing, p.454-463, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305234]
|
| |
7
|
|
 |
8
|
David E. Culler , Anurag Sah , Klaus E. Schauser , Thorsten von Eicken , John Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.164-175, April 08-11, 1991, Santa Clara, California, United States
|
| |
9
|
William J. Dally , Andrew A. Chien , Stuart Fiske , Greg Fyler , Waldemar Horwat , John S. Keen , Richard A. Lethin , Michael D. Noakes , Peter R. Nuth , D. Scott Wills, The Message Driven Processor: An Integrated Multicomputer Processing Element, Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors, p.416-419, October 11-14, 1992
|
| |
10
|
William J. Dally , J. A. Stuart Fiske , John S. Keen , Richard A. Lethin , Michael D. Noakes , Peter R. Nuth , Roy E. Davison , Gregory A. Fyler, The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms, IEEE Micro, v.12 n.2, p.23-39, March 1992
[doi> 10.1109/40.127581]
|
 |
11
|
Jeff Draper , Jacqueline Chame , Mary Hall , Craig Steele , Tim Barrett , Jeff LaCoss , John Granacki , Jaewook Shin , Chun Chen , Chang Woo Kang , Ihn Kim , Gokhan Daglikoca, The architecture of the DIVA processing-in-memory chip, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
[doi> 10.1145/514191.514197]
|
 |
12
|
Basilio B. Fraguela , Jose Renau , Paul Feautrier , David Padua , Josep Torrellas, Programming the FlexRAM parallel intelligent memory system, Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 11-13, 2003, San Diego, California, USA
|
| |
13
|
IBM. The power PC 440 core. Technical report, IBM Microelectronics Division, Research Triangle Park, NC, Sept. 1999.
|
| |
14
|
IBM. IBM SA-27E Embedded DRAM Macro Datasheet, Apr. 2002.
|
| |
15
|
IBM. Embedded Memory Selection Guide. http://www-3.ibm.com/chips/products/asics/products/ememory.html, Mar. 2003.
|
| |
16
|
|
| |
17
|
|
| |
18
|
T. Kirihata et. al. A 113 mm2 600 mb/s/pin 512 mb DDR2 SDRAM with vertically-folded bitline architecture. In International Solid-State Circuits Conference (ISSCC), San Francisco, CA, Feb. 2002. IEEE, IEEE.
|
| |
19
|
G. Kirsch. Active memory device delivers massive parallelism. In Microprocessor Forum, San Jose, CA, Oct. 2002.
|
| |
20
|
G. Konstadinidis et. al. Implementation of a third-generation 1.1GHz 64b microprocessor. In International Solid-State Circuits Conference (ISSCC), page 338, San Francisco, CA, Feb. 2002. IEEE, IEEE.
|
| |
21
|
C. Kozyrakis, J. Gebis, D. Martin, S. Williams, I. Mavroidis, S. Pope, D. Jones, and D. Patterson. Vector IRAM: A media-enhanced vector processor with embedded DRAM. In IEEE, editor, Hot Chips 12: Stanford University, Stanford, California, August 13--15, 2000, pages ??--??, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 2000. IEEE Computer Society Press.
|
| |
22
|
|
| |
23
|
MIPS. MIPS64 5K family, www.mips.com, 2003.
|
| |
24
|
S. D. Naffziger and G. Hammond. The implementation of the next-generation 64 b itanium microprocessor. In International Solid-State Circuits Conference (ISSCC), page 344, San Francisco, CA, Feb. 2002. IEEE, IEEE.
|
 |
25
|
|
 |
26
|
Michael D. Noakes , Deborah A. Wallach , William J. Dally, The J-machine multicomputer: an architectural evaluation, Proceedings of the 20th annual international symposium on Computer architecture, p.224-235, May 16-19, 1993, San Diego, California, United States
|
 |
27
|
|
 |
28
|
|
| |
29
|
R. P. Preston et. al. Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading. In International Solid-State Circuits Conference (ISSCC), page 334, San Francisco, CA, Feb. 2002. IEEE, IEEE.
|
 |
30
|
Ashley Saulsbury , Fong Pong , Andreas Nowatzyk, Missing the memory wall: the case for processor/memory integration, Proceedings of the 23rd annual international symposium on Computer architecture, p.90-101, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
31
|
Semiconductor Industries Association. International technology roadmap for semiconductors. Technical report, 2001.
|
 |
32
|
|
| |
33
|
|
| |
34
|
S. Thoziyoor. PIM lite: VLSI prototype of a multithreaded processor-in-memory chip. M.s. thesis, University of Notre Dame, Apr. 2004.
|
 |
35
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
| |
36
|
H. Yoon et. al. A 4 gb DDR SDRAM with gain-controlled pre-sensing and reference bitline calibration schemes in the twisted open bitline architecture. In International Solid-State Circuits Conference (ISSCC), pages 378--79, San Francisco, CA, Feb. 2002. IEEE, IEEE.
|
CITED BY 3
|
|
|
Michael J. Beauchamp , Scott Hauck , Keith D. Underwood , K. Scott Hemmert, Embedded floating-point units in FPGAs, Proceedings of the internation symposium on Field programmable gate arrays, February 22-24, 2006, Monterey, California, USA
|
|
|
|
|