|
ABSTRACT
The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each J-Machine node consists of an integrated multicomputer component, the Message-Driven Processor (MDP), and 1 MByte of DRAM. The MDP provides mechanisms to support efficient communication, synchronization, and naming. A 512 node J-Machine is operational and is due to be expanded to 1024 nodes in March 1993. In this paper we discuss the design of the J-Machine and evaluate the effectiveness of the mechanisms incorporated into the MDP. We measure the performance of the communication and synchronization mechanisms directly and investigate the behavior of four complete applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
BLELLOCrl, G. Scans as primitive parallel operations. In International Conference on Parallel Processing (1987), pp. $355-362.
|
| |
4
|
|
| |
5
|
William J. Dally , J. A. Stuart Fiske , John S. Keen , Richard A. Lethin , Michael D. Noakes , Peter R. Nuth , Roy E. Davison , Gregory A. Fyler, The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms, IEEE Micro, v.12 n.2, p.23-39, March 1992
[doi> 10.1109/40.127581]
|
| |
6
|
DUNIGAN, T. Communication performance of the Intel Touchstone Delta mesh. Tech. Rep. ORN~-11983, Oak Ridge National Laboratory, Jan. 1992.
|
| |
7
|
DUNIGAN, T. Kendall Square multiprocessor: early experiences and performance. Tech. Rep. ORNL/TM-12065, Oak Ridge National Laboratory, Mar. 1992.
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
KNOBE, K., LUKAS, J. D., AND DALLY, W. J. Dynamic alignment on distributed memory systems. In The Third Workshop on Compilers for Parallel Computers (Vienna, Austria, July 1992), Austrian Center for Parallel Computation.
|
| |
12
|
NIKHIL, R. S., AND ARVIND. Id language reference manual version 90.1. Tech. Rep. 284-2, Computation Structures Group, MIT, Cambridge, MA 02139, 1991.
|
| |
13
|
|
| |
14
|
SHAW, A. implementing data-parallel softwaa'e on dataflow hardware. Master's thesis, MIT, Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, January 1993.
|
| |
15
|
SPERTUS, E. Execution of dataflow programs on generalpurpose hardware. MS Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Aug. 1992.
|
| |
16
|
TAYLOR, S., ET AL. Scalable concurrent programming project. Semiannual technical report, Dept. of Computer Science, California Institute of Technology, Apr. 1992.
|
 |
17
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
[doi> 10.1145/139669.140382]
|
CITED BY 68
|
|
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, ACM SIGARCH Computer Architecture News, v.23 n.2, p.2-13, May 1995
|
|
|
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , J. Kubiatowicz , B.-H. Lim , K. Mackenzie , D. Yeung, The MIT Alewife machine: architecture and performance, 25 years of the international symposia on Computer architecture (selected papers), p.509-520, June 27-July 02, 1998, Barcelona, Spain
|
|
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, Proceedings of the 28th annual international symposium on Microarchitecture, p.146-156, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
Matthias A. Blumrich , Richard D. Alpert , Yuqun Chen , Douglas W. Clark , Stefanos N. Damianakis , Cezary Dubnicki , Edward W. Felten , Liviu Iftode , Kai Li , Margaret Martonosi , Robert A. Shillner, Design choices in the SHRIMP system: an empirical study, ACM SIGARCH Computer Architecture News, v.26 n.3, p.330-341, June 1998
|
|
|
|
|
|
|
|
|
|
|
|
Mark Oskin , Justin Hensley , Diana Keen , Frederic T. Chong , Matthew Farrens , Aneet Chopra, Exploiting ILP in page-based intelligent memory, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.208-218, November 16-18, 1999, Haifa, Israel
|
|
|
Yuetsu Kodama , Hirohumi Sakane , Mitsuhisa Sato , Hayato Yamana , Shuichi Sakai , Yoshinori Yamaguchi, The EM-X parallel computer: architecture and basic performance, ACM SIGARCH Computer Architecture News, v.23 n.2, p.14-23, May 1995
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, v.22 n.2, p.302-313, April 1994
|
|
|
Jeffrey Kuskin , David Ofelt , Mark Heinrich , John Heinlein , Richard Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, 25 years of the international symposia on Computer architecture (selected papers), p.485-496, June 27-July 02, 1998, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mark Heinrich , Jeffrey Kuskin , David Ofelt , John Heinlein , Joel Baxter , Jaswinder Pal Singh , Richard Simoni , Kourosh Gharachorloo , David Nakahira , Mark Horowitz , Anoop Gupta , Mendel Rosenblum , John Hennessy, The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, v.29 n.11, p.274-285, Nov. 1994
|
|
|
|
|
|
|
|
|
|
|
|
David Patterson , Thomas Anderson , Neal Cardwell , Richard Fromm , Kimberly Keeton , Christoforos Kozyrakis , Randi Thomas , Katherine Yelick, A Case for Intelligent RAM, IEEE Micro, v.17 n.2, p.34-44, March 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
William J. Dally , Andrew Chien , Stuart Fiske , Waldemar Horwat , Richard Lethin , Michael Noakes , Peter Nuth , Ellen Spertus , Deborah Wallach , D. Scott Wills , Andrew Chang , John Keen, Retrospective: the J-machine, 25 years of the international symposia on Computer architecture (selected papers), p.54-58, June 27-July 02, 1998, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jay B. Brockman , Shyamkumar Thoziyoor , Shannon K. Kuntz , Peter M. Kogge, A low cost, multithreaded processing-in-memory system, Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, p.16-22, June 20-20, 2004, Munich, Germany
|
|
|
|
|
|
Julia Chen , Philo Juang , Kevin Ko , Gilberto Contreras , David Penry , Ram Rangan , Adam Stoler , Li-Shiuan Peh , Margaret Martonosi, Hardware-modulated parallelism in chip multiprocessors, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, ACM SIGARCH Computer Architecture News, v.32 n.2, p.2, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haroon-Ur-Rashid Haroon-Ur-Rashid , Shi Feng , Ji Weixing , Qiao Baojun, TriBA: a novel scalable architecture for high performance parallel computing applications, Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science, p.396-401, April 15-17, 2007, Hangzhou, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alex Solomatnikov , Amin Firoozshahian , Ofer Shacham , Zain Asgar , Megan Wachs , Wajahat Qadeer , Stephen Richardson , Mark Horowitz, Using a configurable processor generator for computer architecture prototyping, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, December 12-16, 2009, New York, New York
|
|