| The MIT Alewife machine: architecture and performance |
| Full text |
Pdf
(1.49 MB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 22nd annual international symposium on Computer architecture
table of contents
S. Margherita Ligure, Italy
Pages: 2 - 13
Year of Publication: 1995
ISBN:0-89791-698-0
Also published in ...
|
|
Authors
|
|
Anant Agarwal
|
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
Ricardo Bianchini
|
University of Rochester, Rochester, NY and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
David Chaiken
|
Digital Equipment Corporation Systems Research, Center, Palo Alto, CA and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
Kirk L. Johnson
|
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
David Kranz
|
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
John Kubiatowicz
|
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
Beng-Hong Lim
|
IBM T.J. Watson Research Center, Yorktown, Heights, NY and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
Kenneth Mackenzie
|
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
Donald Yeung
|
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 54, Citation Count: 87
|
|
|
ABSTRACT
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a prototype implementation of the architecture, demonstrates that a parallel system can be both scalable and programmable. Four mechanisms combine to achieve these goals: software-extended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers to provide efficient communication and synchronization; support for fine-grain computation allows many processors to cooperate on small problem sizes; and latency tolerance mechanisms --- including block multithreading and prefetching --- mask unavoidable delays due to communication.Microbenchmarks, together with over a dozen complete applications running on the 32-node prototype, help to analyze the behavior of the system. Analysis shows that integrating message passing with shared memory enables a cost-efficient solution to the cache coherence problem and provides a rich set of programming primitives. Block multithreading and prefetching improve performance by up to 25% individually, and 35% together. Finally, language constructs that allow programmers to express fine-grain synchronization can improve performance by over a factor of two.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agarwal, D. Kranz, and V Natarajan. Automatic Partitioning of Parallel Loops for Cache-Coherent Mult~processors. In The 22rid International Con/erence on Parallel Processing, August 1993.
|
| |
2
|
Anant Agarwal , John Kubiatowicz , David Kranz , Beng-Hong Lim , Donald Yeung , Godfrey D'Souza , Mike Parkin, Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors, IEEE Micro, v.13 n.3, p.48-61, May 1993
[doi> 10.1109/40.216748]
|
 |
3
|
Anant Agarwal , Beng-Hong Lim , David Kranz , John Kubiatowicz, APRIL: a processor architecture for multiprocessing, Proceedings of the 17th annual international symposium on Computer Architecture, p.104-114, May 28-31, 1990, Seattle, Washington, United States
|
 |
4
|
Gail Alverson , Robert Alverson , David Callahan , Brian Koblenz , Allan Porterfield , Burton Smith, Exploiting heterogeneous parallelism on a multithreaded multiprocessor, Proceedings of the 6th international conference on Supercomputing, p.188-197, July 19-24, 1992, Washington, D. C., United States
[doi> 10.1145/143369.143408]
|
| |
5
|
ANSI/IEEE Std 1596-1992 Scalable Coherent Interface, 1992.
|
 |
6
|
|
| |
7
|
D. Bmley et al. The NAS Parallel Benchmarks Technical Report RNR-94-007, NASA Ames Research Center, March 1994.
|
 |
8
|
|
 |
9
|
David Chaiken , John Kubiatowicz , Anant Agarwal, LimitLESS directories: A scalable cache coherence scheme, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.224-234, April 08-11, 1991, Santa Clara, California, United States
|
| |
10
|
F. Chong, S. Sharma, E. Brewer, and J. Saltz. Multlprocessor Runtime Support for Irregular DAGs In R. Kalia and P. Vash~shta, editors, Toward Teraflop Computing attd New Grand Challenge Apphcations. Nova Science Publishers, Inc., 1995.
|
| |
11
|
I Duff, R. Grimes, and J. Lewis. User's Guide for the Harwell-Boemg Sparse Matrix Collection Technical Report TR/PA/92/86, CERFACS, October 1992.
|
| |
12
|
|
 |
13
|
D. A. Kranz , R. H. Halstead, Jr. , E. Mohr, Mul-T: a high-performance parallel Lisp, Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, p.81-90, June 19-23, 1989, Portland, Oregon, United States
|
 |
14
|
David Kranz , Kirk Johnson , Anant Agarwal , John Kubiatowicz , Beng-Hong Lim, Integrating message-passing and shared-memory: early experience, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.54-63, May 19-22, 1993, San Diego, California, United States
|
| |
15
|
|
 |
16
|
|
 |
17
|
John Kubiatowicz , David Chaiken , Anant Agarwal, Closing the window of vulnerability in multiphase memory transactions, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.274-284, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
18
|
J. Kubiatowicz, D. Chaiken, A. Agarwal, A Altman, J Babb, D Kranz, B H. Llm, K. Mackenzie, J, Piscitello, and D Yeung The Alewife CMMU' Addressing the Multiprocessor Commumcations Gap In HOTCHIPS, August 1994.
|
 |
19
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, Proceedings of the 21ST annual international symposium on Computer architecture, p.302-313, April 18-21, 1994, Chicago, Illinois, United States
|
| |
20
|
D. Lenoski , J. Laudon , T. Joe , D. Nakahira , L. Stevens , A. Gupta , J. Hennessy, The DASH Prototype: Logic Overhead and Performance, IEEE Transactions on Parallel and Distributed Systems, v.4 n.1, p.41-61, January 1993
[doi> 10.1109/71.205652]
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
 |
24
|
Michael D. Noakes , Deborah A. Wallach , William J. Dally, The J-machine multicomputer: an architectural evaluation, Proceedings of the 20th annual international symposium on Computer architecture, p.224-235, May 16-19, 1993, San Diego, California, United States
|
 |
25
|
|
 |
26
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21ST annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
| |
27
|
Charles L. Seitz , Nanette J. Boden , Jakov Seizovic , Wen-King Su, The design of the Caltech Mosaic C multicomputer, Proceeding of the 1993 symposium on Research on integrated systems, p.1-22, February 1993, Seattle, Washington, United States
|
 |
28
|
|
| |
29
|
B.J. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. Society of Photo-opttcal hlstrumen ration Engineers, 298:241-248, 1981.
|
 |
30
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
 |
31
|
|
CITED BY 87
|
|
|
|
|
Robert Stets , Sandhya Dwarkadas , Nikolaos Hardavellas , Galen Hunt , Leonidas Kontothanassis , Srinivasan Parthasarathy , Michael Scott, Cashmere-2L: software coherent shared memory on a clustered remote-write network, ACM SIGOPS Operating Systems Review, v.31 n.5, p.170-183, Dec. 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Umakishore Ramachandran , Gautam Shah , Anand Sivasubramaniam , Aman Singla , Ivan Yanasak, Architectural mechanisms for explicit communication in shared memory multiprocessors, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.62-es, December 04-08, 1995, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R. Bianchini , L. I. Kontothanassis , R. Pinto , M. De Maria , M. Abud , C. L. Amorim, Hiding communication latency and coherence overhead in software DSMs, ACM SIGOPS Operating Systems Review, v.30 n.5, p.198-209, Dec. 1996
|
|
|
|
|
|
|
|
|
Matthias A. Blumrich , Richard D. Alpert , Yuqun Chen , Douglas W. Clark , Stefanos N. Damianakis , Cezary Dubnicki , Edward W. Felten , Liviu Iftode , Kai Li , Margaret Martonosi , Robert A. Shillner, Design choices in the SHRIMP system: an empirical study, ACM SIGARCH Computer Architecture News, v.26 n.3, p.330-341, June 1998
|
|
|
|
|
|
Richard P. Martin , Amin M. Vahdat , David E. Culler , Thomas E. Anderson, Effects of communication latency, overhead, and bandwidth in a cluster architecture, ACM SIGARCH Computer Architecture News, v.25 n.2, p.85-97, May 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xin-Min Tian , Shashank Nemawarkar , Guang R. Gao , Herbert Hum, Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor, Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research, p.37, November 12-14, 1996, Toronto, Ontario, Canada
|
|
|
|
|
|
Andrew Sohn , Yuetsu Kodama , Jui Ku , Mitsuhisa Sato , Hirofumi Sakane , Hayato Yamana , Shuichi Sakai , Yoshinori Yamaguchi, Fine-grain multithreading with the EM-X multiprocessor, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.189-198, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wilson C. Hsieh , M. Frans Kaashoek , William E. Weihl, Dynamic computation migration in DSM systems, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.44-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
A. Grbic , S. Brown , S. Caranci , R. Grindley , M. Gusat , G. Lemieux , K. Loveless , N. Manjikian , S. Srbljic , M. Stumm , Z. Vranesic , Z. Zilic, Design and implementation of the NUMAchine multiprocessor, Proceedings of the 35th annual conference on Design automation, p.66-69, June 15-19, 1998, San Francisco, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mainak Chaudhuri , Mark Heinrich , Chris Holt , Jaswinder Pal Singh , Edward Rothberg , John Hennessy, Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation, IEEE Transactions on Computers, v.52 n.7, p.862-880, July 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonidas Kontothanassis , Robert Stets , Galen Hunt , Umit Rencuzogullari , Gautam Altekar , Sandhya Dwarkadas , Michael L. Scott, Shared memory computing on clusters with symmetric multiprocessors and system area networks, ACM Transactions on Computer Systems (TOCS), v.23 n.3, p.301-335, August 2005
|
|
|
|
|
|
|
|
|
Takashi Nakamura , Toshiyuki Iwamiya , Masahiro Yoshida , Yuichi Matsuo , Masahiro Fukuda, Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT), Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.47-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
|
|
|
Orran Krieger , Marc Auslander , Bryan Rosenburg , Robert W. Wisniewski , Jimi Xenidis , Dilma Da Silva , Michal Ostrowski , Jonathan Appavoo , Maria Butrico , Mark Mergen , Amos Waterland , Volkmar Uhlig, K42: building a complete operating system, ACM SIGOPS Operating Systems Review, v.40 n.4, October 2006
|
|
|
Satish Chandra , Michael Dahlin , Bradley Richards , Randolph Y. Wang , Thomas E. Anderson , James R. Larus, Experience with a language for writing coherence protocols, Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997, p.5-5, October 15-17, 1997, Santa Barbara, California
|
|
|
Jianer Chen , GaoCai Wang , Chuang Lin , Tao Wang , GuoJun Wang, Probabilistic analysis on mesh network fault tolerance, Journal of Parallel and Distributed Computing, v.67 n.1, p.100-110, January, 2007
|
|
|
|
|
|
|
|
|
B. Brock , G. Carpenter , E. Chiprout , E. Elnozahy , M. Dean , D. Glasco , J. Peterson , R. Rajamony , F. Rawson , R. Rockhold , A. Zimmerman, Windows NT in a ccNUMA system, Proceedings of the 3rd conference on USENIX Windows NT Symposium, p.7-7, July 12-15, 1999, Seattle, Washington
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haroon-Ur-Rashid Haroon-Ur-Rashid , Shi Feng , Ji Weixing , Qiao Baojun, TriBA: a novel scalable architecture for high performance parallel computing applications, Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science, p.396-401, April 15-17, 2007, Hangzhou, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amin Firoozshahian , Alex Solomatnikov , Ofer Shacham , Zain Asgar , Stephen Richardson , Christos Kozyrakis , Mark Horowitz, A memory system design framework: creating smart memories, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|