|
ABSTRACT
During the period following the completion of the Cosmic Cube experiment [1], and while commercial descendants of this first-generation multicomputer (message-passing concurrent computer) were spreading through a community that includes many of the attendees of this conference, members of our research group were developing a set of ideas about the physical design and programming for the second generation of medium-grain multicomputers.
Our principal goal was to improve by as much as two orders of magnitude the relationship between message-passing and computing performance, and also to make the topology of the message-passing network practically invisible. Decreasing the communication latency relative to instruction execution times extends the application span of multicomputers from easily partitioned and distributed problems (eg, matrix computations, PDE solvers, finite element analysis, finite difference methods, distant or local field many-body problems, FFTs, ray tracing, distributed simulation of systems composed of loosely coupled physical processes) to computing problems characterized by “high flux” [2] or relatively fine-grain concurrent formulations [3, 4] (eg, searching, sorting, concurrent data structures, graph problems, signal processing, image processing, and distributed simulation of systems composed of many tightly coupled physical processes). Such applications place heavy demands on the message-passing network for high bandwidth, low latency, and non-local communication. Decreased message latency also improves the efficiency of the class of applications that have been developed on first-generation systems, and the insensitivity of message latency to process placement simplifies the concurrent formulation of application programs.
Our other goals included a streamlined and easily layered set of message primitives, a node operating system based on a reactive programming model, open interfaces for accelerators and peripheral devices, and node performance improvements that could be achieved economically by using the same technology employed in contemporary workstation computers.
By the autumn of 1986, these ideas had become sufficiently developed, molded together, and tested through simulation to be regarded as a complete architectural design. We were fortunate that the Ametek Computer Research Division was ready and willing to work with us to develop this system as a commercial product. The Ametek Series 2010 multicomputer is the result of this joint effort.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
J D Ullman, "Flux, Sorting, and Supercomputer Organization for AI Applications," J of Parallel and Distributed Computing 1: 133--151, 1984.
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
P Kermani and L Kleinrock, "Virtual Cutthrough: A New Computer Communication Switching Technique," Computer Networks 3: 267- 286, 1979.
|
| |
7
|
William J Dally, Charles L Seitz, "The Torus Routing Chip," Distributed Computing 1(4): 187- 196, Springer International, 1986.
|
| |
8
|
William J Dally, "Wire-Efficient VLSI Multiprocessor Communication Networks,~ Pro~ 1987 Stanford Conference on Advanced Research in VLSI, MIT Press, 1987.
|
 |
9
|
|
CITED BY 46
|
|
P. K. McKinley , H. Xu , E. T. Kalns , L. M. Ni, ComPaSS: efficient communication services for scalable architectures, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.478-487, November 16-20, 1992, Minneapolis, Minnesota, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A. J. Martin, A message-passing model for highly concurrent computation, Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues, p.520-527, January 19-20, 1988, Pasadena, California, United States
|
|
|
Amotz Bar-Noy , Prabhakar Raghavan , Baruch Schieber , Hisao Tamaki, Fast deflection routing for packets and worms, Proceedings of the twelfth annual ACM symposium on Principles of distributed computing, p.75-86, August 15-18, 1993, Ithaca, New York, United States
|
|
|
|
|
|
M. Chen , Y. Choo , J. Li, Crystal: from functional description to efficient parallel code, Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues, p.417-433, January 19-20, 1988, Pasadena, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shekhar Borkar , Robert Cohn , George Cox , Thomas Gross , H. T. Kung , Monica Lam , Margie Levine , Brian Moore , Wire Moore , Craig Peterson , Jim Susman , Jim Sutton , John Urbanski , Jon Webb, Supporting systolic and memory communication in iWarp, ACM SIGARCH Computer Architecture News, v.18 n.3a, p.70-81, June 1990
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|