|
ABSTRACT
A major trend in high performance computer architecture over the last two decades is the migration of memory in the form of high speed caches onto the microprocessor semiconductor die. Where temporal locality in the computation is high, caches prove very effective at hiding memory access latency and contention for communication resources. However where temporal locality is absent, caches may exhibit low hit rates resulting in poor operational efficiency. Vector computing exploiting pipelined arithmetic units and memory access address this challenge for certain forms of data access patterns, for example involving long contiguous data sets exhibiting high spatial locality. But for many advanced applications for science, technology, and national security at least some data access patterns are not consistent to the restricted forms well handled by either caches or vector processing. An important alternative is the reverse strategy; that of migrating logic in to the main memory (DRAM) and performing those operations directly on the data stored there. Processor in Memory (PIM) architecture has advanced to the point where it may fill this role and provide an important new mechanism for improving performance and efficiency of future supercomputers for a broad range of applications. One important project considering both the role of PIM in supercomputer architecture and the design of such PIM components is the Cray Cascade Project sponsored by the DARPA High Productivity Computing Program. Cascade is a Petaflops scale computer targeted for deployment at the end of the decade that merges the raw speed of an advanced custom vector architecture with the high memory bandwidth processing delivered by an innovative class of PIM architecture. The work represented here was performed under the Cascade project to explore critical design space issues that will determine the value of PIM in supercomputers and contribute to the optimization of its design. But this work also has strong relevance to hybrid systems comprising a combination of conventional microprocessors and advanced PIM based intelligent main memory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
NR Adiga , G Almasi , GS Almasi , Y Aridor , R Barik , D Beece , R Bellofatto , G Bhanot , R Bickford , M Blumrich , AA Bright , J Brunheroto , C Caşcaval , J Castaños , W Chan , L Ceze , P Coteus , S Chatterjee , D Chen , G Chiu , TM Cipolla , P Crumley , KM Desai , A Deutsch , T Domany , MB Dombrowa , W Donath , M Eleftheriou , C Erway , J Esch , B Fitch , J Gagliano , A Gara , R Garg , R Germain , ME Giampapa , B Gopalsamy , J Gunnels , M Gupta , F Gustavson , S Hall , RA Haring , D Heidel , P Heidelberger , LM Herger , D Hoenicke , RD Jackson , T Jamal-Eddine , GV Kopcsay , E Krevat , MP Kurhekar , AP Lanzetta , D Lieber , LK Liu , M Lu , M Mendell , A Misra , Y Moatti , L Mok , JE Moreira , BJ Nathanson , M Newton , M Ohmacht , A Oliner , V Pandit , RB Pudota , R Rand , R Regan , B Rubin , A Ruehli , S Rus , RK Sahoo , A Sanomiya , E Schenfeld , M Sharma , E Shmueli , S Singh , P Song , V Srinivasan , BD Steinmacher-Burow , K Strauss , C Surovic , R Swetz , T Takken , RB Tremaine , M Tsao , AR Umamaheshwaran , P Verma , P Vranas , TJC Ward , M Wazlowski , W Barrett , C Engel , B Drehmel , B Hilgart , D Hill , F Kasemkhani , D Krolak , CT Li , T Liebsch , J Marcella , A Muff , A Okomo , M Rouse , A Schram , M Tubbs , G Ulsh , C Wait , J Wittrup , M Bae , K Dockser , L Kissel , MK Seager , JS Vetter , K Yates, An overview of the BlueGene/L Supercomputer, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-22, November 16, 2002, Baltimore, Maryland
|
| |
2
|
|
| |
3
|
[3] J. Brockman, J. Zawodny, P. Kogge, and E. Johnson. Cache-in-Memory: A lower power alternative. Barcelona, Spain, June 1998. Workshop on Power-Driven Microarchitecture, held in conjunction with the International Symposium on Computer Architecture.
|
| |
4
|
[4] J. B. Brockman, E. Kang, S. Kuntz, and P. Kogge. The architecture and implementation of a microserver-on-a-chip. Technical Report CSE TR02-05, University of Notre Dame CSE Dept., 2002.
|
 |
5
|
Jay B. Brockman , Peter M. Kogge , Thomas L. Sterling , Vincent W. Freeh , Shannon K. Kuntz, Microservers: a new memory semantics for massively parallel computing, Proceedings of the 13th international conference on Supercomputing, p.454-463, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305234]
|
| |
6
|
[6] Microsoft Corporation. www.microsoft.com.
|
| |
7
|
|
| |
8
|
|
| |
9
|
William J. Dally , Andrew A. Chien , Stuart Fiske , Greg Fyler , Waldemar Horwat , John S. Keen , Richard A. Lethin , Michael D. Noakes , Peter R. Nuth , D. Scott Wills, The Message Driven Processor: An Integrated Multicomputer Processing Element, Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors, p.416-419, October 11-14, 1992
|
| |
10
|
[10] William Dally, Andrew Chang, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Richard Lethin, Michael Noakes, Peter Nuth, Ellen Spertus, Deborah Wallach, and D. Scott Wills. The J-machine: A retrospective.
|
| |
11
|
[11] Monty Denneau. Blue gene. In SC2000: High Performance Networking and Computing, pages 35-35, Dallas, TX, November 2000. ACM.
|
 |
12
|
Richard Fromm , Stylianos Perissakis , Neal Cardwell , Christoforos Kozyrakis , Bruce McGaughy , David Patterson , Tom Anderson , Katherine Yelick, The energy efficiency of IRAM architectures, Proceedings of the 24th annual international symposium on Computer architecture, p.327-337, June 01-04, 1997, Denver, Colorado, United States
|
 |
13
|
Mary Hall , Peter Kogge , Jeff Koller , Pedro Diniz , Jacqueline Chame , Jeff Draper , Jeff LaCoss , John Granacki , Jay Brockman , Apoorv Srivastava , William Athas , Vincent Freeh , Jaewook Shin , Joonseok Park, Mapping irregular applications to DIVA, a PIM-based data-intensive architecture, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.57-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331589]
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
[17] HyPerformix Inc. www.hyperformix.com.
|
| |
18
|
[18] The MathWorks Inc. www.mathworks.com.
|
| |
19
|
[19] Graham Kirsch. Active memory device delivers massive parallelism. In Microprocessor Forum, San Jose, CA, October 2002.
|
| |
20
|
|
| |
21
|
[21] Christoforos Kozyrakis, Joseph Gebis, David Martin, Samuel Williams, Ioannis Mavroidis, Steven Pope, Darren Jones, and David Patterson. Vector IRAM: A media-enhanced vector processor with embedded DRAM. In IEEE, editor, Hot Chips 12: Stanford University, Stanford, California, August 13-15, 2000, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 2000. IEEE Computer Society Press.
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
Michael D. Noakes , Deborah A. Wallach , William J. Dally, The J-machine multicomputer: an architectural evaluation, Proceedings of the 20th annual international symposium on Computer architecture, p.224-235, May 16-19, 1993, San Diego, California, United States
|
 |
26
|
|
 |
27
|
|
 |
28
|
Ashley Saulsbury , Fong Pong , Andreas Nowatzyk, Missing the memory wall: the case for processor/memory integration, Proceedings of the 23rd annual international symposium on Computer architecture, p.90-101, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
29
|
[29] Burton Smith. A massively parallel shared memory computer. In ACM-SIGACT; ACM-SIGARCH, editor, Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 123-124, Hilton Head, SC, July 1991. ACM Press.
|
| |
30
|
[30] Burton J. Smith. A pipelined, shared resource MIMD computer. In Proceedings of the 1978 International Conference on Parallel Processing, pages 6-8, 1978.
|
 |
31
|
|
 |
32
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
|