|
ABSTRACT
Shared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-passing communications network. This interpretive layer is responsible for data location, data movement, and cache coherence. It uses patterns of communication that benefit common programming styles, but which are only heuristics. This suggests that certain styles of communication may benefit from direct access to the underlying communications substrate. The Alewife machine, a shared-memory multiprocessor being built at MIT, provides such an interface. The interface is an integral part of the shared memory implementation and affords direct, user-level access to the network queues, supports an efficient DMA mechanism, and includes fast trap handling for message reception. This paper discusses the design and implementation of the Alewife message-passing interface and addresses the issues and advantages of using such an interface to complement hardware-synthesized shared memory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agarwal , D. Chaiken , K. Johnson , D. Kranz , J. Kubiatowicz , K. Kurihara , B. H. Lim , G. Maa , D. Nussbaum , M. Parkin , D. Yeung, THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR, Massachusetts Institute of Technology, Cambridge, MA, 1991
|
| |
2
|
Thomas H. Dunigan. Kendall Square Multiprocessor: Early Experiences and Performance. Technical Report ORNL/TM-12065, Oak Ridge National Laboratory, March 1992.
|
| |
3
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
| |
4
|
|
| |
5
|
Charles L. Seitz. Concurrent VLSI Architectures. IEEE Transactions on Computers, C-33(12):I247-1265, December 1984.
|
| |
6
|
|
 |
7
|
David Chaiken , John Kubiatowicz , Anant Agarwal, LimitLESS directories: A scalable cache coherence scheme, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.224-234, April 08-11, 1991, Santa Clara, California, United States
|
 |
8
|
A. Agarwal , R. Simoni , J. Hennessy , M. Horowitz, An evaluation of directory schemes for cache coherence, Proceedings of the 15th Annual International Symposium on Computer architecture, p.280-298, May 30-June 02, 1988, Honolulu, Hawaii, United States
|
| |
9
|
S. Borkar , R. Cohn , G. Cox , S. Gleason , T. Gross, Warp: an integrated solution of high-speed parallel computing, Proceedings of the 1988 ACM/IEEE conference on Supercomputing, p.330-339, November 12-17, 1988, Orlando, Florida, United States
|
| |
10
|
William J. Dally et al. The J-Machine: A Fine-Grain Concurrent Computer. In Proceedings of the IFIP (International Federationfor In- .formation Processing), 11th Worm Congress, pages 1147-1153, New York, 1989. Elsevier Science Publishing.
|
 |
11
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
 |
12
|
|
 |
13
|
David Kranz , Kirk Johnson , Anant Agarwal , John Kubiatowicz , Beng-Hong Lim, Integrating message-passing and shared-memory: early experience, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.54-63, May 19-22, 1993, San Diego, California, United States
|
| |
14
|
MIT-SPARCLE Specification Version 1.1 (Preliminary). LSI Logic Corporation, Milpitas, CA 95035, 1990. Addendum to the 648I 1 specification.
|
| |
15
|
Charles L. Seitz , Nanette J. Boden , Jakov Seizovic , Wen-King Su, The design of the Caltech Mosaic C multicomputer, Proceeding of the 1993 symposium on Research on integrated systems, p.1-22, February 1993, Seattle, Washington, United States
|
 |
16
|
Mark D. Hill , James R. Larus , Steven K. Reinhardt , David A. Wood, Cooperative shared memory: software and hardware for scalable multiprocessor, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.262-273, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
17
|
John Kubiatowicz. User's Manual for the A- 1000 Communications and Memory Management Unit. ALEWIFE Memo No. 19, Laboratory for Computer Science, Massachusetts Institute of Technology, January 1991.
|
 |
18
|
John Kubiatowicz , David Chaiken , Anant Agarwal, Closing the window of vulnerability in multiphase memory transactions, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.274-284, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
19
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
| |
20
|
Anant Agarwal , John Kubiatowicz , David Kranz , Beng-Hong Lim , Donald Yeung , Godfrey D'Souza , Mike Parkin, Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors, IEEE Micro, v.13 n.3, p.48-61, May 1993
[doi> 10.1109/40.216748]
|
| |
21
|
|
| |
22
|
The Connection Machine System: Programming the NI. Thinking Machines Corporation, March 1992. Version 7.1.
|
 |
23
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
 |
24
|
|
 |
25
|
|
CITED BY 26
|
|
|
|
|
T. E. Anderson , M. D. Dahlin , J. M. Neefe , D. A. Patterson , D. S. Roselli , R. Y. Wang, Serverless network file systems, ACM SIGOPS Operating Systems Review, v.29 n.5, p.109-126, Dec. 3, 1995
|
|
|
|
|
|
Arvind Krishnamurthy , Klaus E. Schauser , Chris J. Scheiman , Randolph Y. Wang , David E. Culler , Katherine Yelick, Evaluation of architectural support for global address-based communication in large-scale parallel machines, ACM SIGOPS Operating Systems Review, v.30 n.5, p.37-48, Dec. 1996
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, ACM SIGARCH Computer Architecture News, v.23 n.2, p.2-13, May 1995
|
|
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , J. Kubiatowicz , B.-H. Lim , K. Mackenzie , D. Yeung, The MIT Alewife machine: architecture and performance, 25 years of the international symposia on Computer architecture (selected papers), p.509-520, June 27-July 02, 1998, Barcelona, Spain
|
|
|
|
|
|
Thomas E. Anderson , Michael D. Dahlin , Jeanna M. Neefe , David A. Patterson , Drew S. Roselli , Randolph Y. Wang, Serverless network file systems, ACM Transactions on Computer Systems (TOCS), v.14 n.1, p.41-79, Feb. 1996
|
|
|
Eric A. Brewer , Frederic T. Chong , Lok T. Liu , Shamik D. Sharma , John D. Kubiatowicz, Remote queues: exposing message queues for optimization and atomicity, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.42-53, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
Rohit Chandra , Kourosh Gharachorloo , Vijayaraghavan Soundararajan , Anoop Gupta, Performance evaluation of hybrid hardware and software distributed shared memory protocols, Proceedings of the 8th international conference on Supercomputing, p.274-288, July 11-15, 1994, Manchester, England
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, v.22 n.2, p.302-313, April 1994
|
|
|
Jeffrey Kuskin , David Ofelt , Mark Heinrich , John Heinlein , Richard Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, 25 years of the international symposia on Computer architecture (selected papers), p.485-496, June 27-July 02, 1998, Barcelona, Spain
|
|
|
|
|
|
|
|
|
Kenichi Hayashi , Tsunehisa Doi , Takeshi Horie , Yoichi Koyanagi , Osamu Shiraki , Nobutaka Imamura , Toshiyuki Shimizu , Hiroaki Ishihata , Tatsuya Shindo, AP1000+: architectural support of PUT/GET interface for parallelizing compiler, ACM SIGPLAN Notices, v.29 n.11, p.196-207, Nov. 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|