|
ABSTRACT
Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. We call these systems Distributed Scalable Shared-memory Multiprocessors (DSSMPs).This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS. Multigrain shared memory enables the collaboration of hardware and software shared memory, and is effective at exploiting a form of locality called multigrain locality. The system provides efficient support for fine-grain cache-line sharing, and resorts to coarse-grain page-level sharing only when locality is violated. A framework for characterizing application performance on DSSMPs is also introduced.Using MGS, an in-depth study of several shared memory applications is conducted to understand the behavior of DSSMPs. We find that unmodified shared memory applications can exploit multigrain sharing. Keeping the number of processors fixed, applications execute up to 85% faster when each DSSMP node is a multiprocessor as opposed to a uniprocessor. We also show that tightly-coupled multiprocessors hold a significant performance advantage over DSSMPs on unmodified applications. However, a best-effort implementation of a kernel from one of the applications allows a DSSMP to almost match the performance of a tightly-coupled multiprocessor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
T. von Eicken , A. Basu , V. Buch , W. Vogels, U-Net: a user-level network interface for parallel and distributed computing (includes URL), Proceedings of the fifteenth ACM symposium on Operating systems principles, p.40-53, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
2
|
A. L. Cox , S. Dwarkadas , P. Keleher , H. Lu , R. Rajamony , W. Zwaenepoel, Software versus hardware shared-memory implementation: a case study, Proceedings of the 21ST annual international symposium on Computer architecture, p.106-117, April 18-21, 1994, Chicago, Illinois, United States
|
 |
3
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, Proceedings of the 22nd annual international symposium on Computer architecture, p.2-13, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
4
|
Harjinder S. Sandhu , Benjamin Gamsa , Songnian Zhou, The shared regions approach to software cache coherence on multiprocessors, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.229-238, May 19-22, 1993, San Diego, California, United States
|
 |
5
|
K. L. Johnson , M. F. Kaashoek , D. A. Wallach, CRL: high-performance all-software distributed shared memory, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.213-226, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
6
|
|
| |
7
|
Timothy Mark Pinkston and Sandra Johnson Baylor. Parallel Processor Memory Reference Analysis: Examining Locality and Clustering Potential. RC 15801, IBM T. J. Watson Research Center, May 1990.
|
 |
8
|
John B. Carter , John K. Bennett , Willy Zwaenepoel, Implementation and performance of Munin, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.152-164, October 13-16, 1991, Pacific Grove, California, United States
|
 |
9
|
|
| |
10
|
|
 |
11
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
12
|
|
| |
13
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
| |
14
|
Kendall Square Research, Inc., 170 Tracer Lane, Waltham, MA 02154. Kendall Square Research Technical Summary, 1992.
|
| |
15
|
|
| |
16
|
Andrew W. Wilson Jr. and Richard P. LaRowe Jr. Hiding Shared Memory Reference Latency on the Galactica Net Distributed Shared Memory Architecture. Journal of Parallel and Distributed Computing, 15(4):351-367~ 1992.
|
 |
17
|
Rohit Chandra , Kourosh Gharachorloo , Vijayaraghavan Soundararajan , Anoop Gupta, Performance evaluation of hybrid hardware and software distributed shared memory protocols, Proceedings of the 8th international conference on Supercomputing, p.274-288, July 11-15, 1994, Manchester, England
[doi> 10.1145/181181.181543]
|
| |
18
|
Brian N. Bershad and Matthew J. Zekauskas. Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors. CMU-CS 91-170, Carnegie Mellon University, September 1991.
|
| |
19
|
Alan L. Cox and Robert J. Fowler. The Implementation of a Coherent Memory Abstraction on a NUMA Multiprocessor: Experiences with PLATINUM. Technical Report 263, University of Rochester Computer Science Department, May 1989.
|
| |
20
|
Pete Keleher, Alan Cox, Sandhya Dwarkadas, and Willy Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. Proceedings of the 1994 Usenix Conference, pages 115-131,January 1994.
|
 |
21
|
|
CITED BY 22
|
|
|
|
|
|
|
|
Robert Stets , Sandhya Dwarkadas , Nikolaos Hardavellas , Galen Hunt , Leonidas Kontothanassis , Srinivasan Parthasarathy , Michael Scott, Cashmere-2L: software coherent shared memory on a clustered remote-write network, ACM SIGOPS Operating Systems Review, v.31 n.5, p.170-183, Dec. 1997
|
|
|
|
|
|
Dongming Jiang , Brian O'Kelley , Xiang Yu , Sanjeev Kumar , Angelos Bilas , Jaswinder Pal Singh, Application scaling under shared virtual memory on a cluster of SMPs, Proceedings of the 13th international conference on Supercomputing, p.165-174, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonidas Kontothanassis , Galen Hunt , Robert Stets , Nikolaos Hardavellas , Michał Cierniak , Srinivasan Parthasarathy , Wagner Meira, Jr. , Sandhya Dwarkadas , Michael Scott, VM-based shared memory on low-latency, remote-memory-access networks, ACM SIGARCH Computer Architecture News, v.25 n.2, p.157-169, May 1997
|
|
|
|
|
|
|
|
|
Leonidas Kontothanassis , Robert Stets , Galen Hunt , Umit Rencuzogullari , Gautam Altekar , Sandhya Dwarkadas , Michael L. Scott, Shared memory computing on clusters with symmetric multiprocessors and system area networks, ACM Transactions on Computer Systems (TOCS), v.23 n.3, p.301-335, August 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|