|
ABSTRACT
The implementation presented in this paper---DSZOOM-WF---is a sequentially consistent, fine-grained distributed software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest implementation to date of around ten microseconds.The all-software protocol is implemented assuming some basic low-level primitives in the cluster interconnect and an operating system bypass functionality, similar to the emerging InfiniBand standard. All interrupt- and/or poll-based asynchronous protocol processing is completely removed by running the entire coherence protocol in the requesting processor. This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall. The technique is applicable to both page-based and fine-grain software-based shared memory.DSZOOM-WF consistently demonstrates performance comparable to hardware-based distributed shared memory implementations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Artiaga. Personal communication, April 2001.
|
| |
2
|
E. Artiaga, X. Martorell, Y. Becerra, and N. Navarro. Experiences on Implementing PARMACS Macros to Run the SPLASH-2 Suite on Multiprocessors. In Proceedings of the 6th Euromicro Workshop on Parallel and Distributed Processing, January 1998.
|
| |
3
|
E. Artiaga, N. Navarro, X. Martorell, and Y. Becerra. Implementing PARMACS Macros for Shared-Memory Multiprocessor Environments. Technical Report UPC-DAC-1997-07, Department of Computer Architecture, Polytechnic University of Catalunya, January 1997.
|
| |
4
|
B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. In Proceedings of the 38th IEEE Computer Society International Conference, pages 528-537, February 1993.
|
 |
5
|
Angelos Bilas , Cheng Liao , Jaswinder Pal Singh, Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems, Proceedings of the 26th annual international symposium on Computer architecture, p.282-293, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
6
|
|
 |
7
|
|
 |
8
|
John B. Carter , John K. Bennett , Willy Zwaenepoel, Implementation and performance of Munin, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.152-164, October 13-16, 1991, Pacific Grove, California, United States
|
 |
9
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
10
|
Sandhya Dwarkadas , Kourosh Gharachorloo , Leonidas Kontothanassis , Daniel J. Scales , Michael L. Scott , Robert Stets, Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.260, January 09-12, 1999
|
 |
11
|
Andrew Erlichson , Neal Nuckolls , Greg Chesson , John Hennessy, SoftFLASH: analyzing the performance of clustered distributed virtual shared memory, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.210-220, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
12
|
K. Gharachorloo. Personal communication, October 2000.
|
 |
13
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
| |
14
|
|
| |
15
|
|
| |
16
|
L. Iftode, M. Blumrich, C. Dubnicki, D. L. Oppenheimer, J. P. Singh, and K. Li. Shared Virtual Memory with Automatic Update Support. Technical Report TR-575-98, Princeton University, February 1998.
|
| |
17
|
L. Iftode and J. P. Singh. Shared Virtual Memory: Progress and Challenges. Proceedings of the IEEE, Special Issue on Distributed Shared Memory, 87(3):498-507, March 1999.
|
| |
18
|
InfiniBand(SM) Trade Association, InfiniBand Architecture Specification, Release 1.0, October 2000. Available from: http://www.infinibandta.org.
|
 |
19
|
K. L. Johnson , M. F. Kaashoek , D. A. Wallach, CRL: high-performance all-software distributed shared memory, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.213-226, December 03-06, 1995, Copper Mountain, Colorado, United States
|
| |
20
|
|
| |
21
|
P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the Winter 1994 USENIX Conference, pages 115-131, January 1994.
|
 |
22
|
Leonidas Kontothanassis , Galen Hunt , Robert Stets , Nikolaos Hardavellas , Michał Cierniak , Srinivasan Parthasarathy , Wagner Meira, Jr. , Sandhya Dwarkadas , Michael Scott, VM-based shared memory on low-latency, remote-memory-access networks, Proceedings of the 24th annual international symposium on Computer architecture, p.157-169, June 01-04, 1997, Denver, Colorado, United States
|
| |
23
|
|
| |
24
|
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):690-691, September 1979.
|
 |
25
|
|
| |
26
|
K. Li. IVY: A Shared Virtual Memory System for Parallel Computing. In Proceedings of the 1988 International Conference on Parallel Processing (ICPP '88), volume II, pages 94-101, August 1988.
|
 |
27
|
|
| |
28
|
L. W. McVoy and Carl Staelin. lmbench: Portable Tools for Performance Analysis. In Proceedings of the 1996 USENIX Annual Technical Conference, pages 279-294, January 1996.
|
 |
29
|
Shubhendu S. Mukherjee , Babak Falsafi , Mark D. Hill , David A. Wood, Coherent network interfaces for fine-grain communication, Proceedings of the 23rd annual international symposium on Computer architecture, p.247-258, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
30
|
|
| |
31
|
Z. Radović and E. Hagersten. DSZOOM --- Low Latency Software-Based Shared Memory. Technical Report 2001:03, Parallel and Scientific Computing Institute (PSCI), Sweden, April 2001.
|
 |
32
|
|
 |
33
|
|
| |
34
|
D. J. Scales, K. Gharachorloo, and A. Aggarwal. Fine-Grain Software Distributed Shared Memory on SMP Clusters. Technical Report 97/3, Western Research Laboratory, Digital Equipment Corporation, February 1997.
|
 |
35
|
Daniel J. Scales , Kourosh Gharachorloo , Chandramohan A. Thekkath, Shasta: a low overhead, software-only approach for supporting fine-grain shared memory, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.174-185, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
36
|
|
| |
37
|
I. Schoinas, B. Falsafi, M. D. Hill, J. R. Larus, C. E. Lucas, S. S. Mukherjee, S. K. Reinhardt, E. Schnarr, and D. A. Wood. Implementing Fine-Grain Distributed Shared Memory On Commodity SMP Workstations. Technical Report 1307, Computer Sciences Department, University of Wisconsin-Madison, March 1996.
|
 |
38
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.297-306, October 05-07, 1994, San Jose, California, United States
|
| |
39
|
A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yuan, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvey, E. Hagersten, and B. Liencres. Gigaplane: A High Performance Bus for Large SMPs. In Proceedings of IEEE Hot Interconnects IV, pages 41-52, August 1996.
|
| |
40
|
E. Speight and J. Bennett. Brazos: A Third Generation DSM System. In Proceedings of the 1st USENIX Windows NT Symposium, August 1997.
|
 |
41
|
Robert Stets , Sandhya Dwarkadas , Nikolaos Hardavellas , Galen Hunt , Leonidas Kontothanassis , Srinivasan Parthasarathy , Michael Scott, Cashmere-2L: software coherent shared memory on a clustered remote-write network, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.170-183, October 05-08, 1997, Saint Malo, France
|
| |
42
|
M. Weiser. Program Slicing. IEEE Transactions on Software Engineering, SE-10(4):352-357, July 1984.
|
 |
43
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
44
|
Donald Yeung , John Kubiatowicz , Anant Agarwal, MGS: a multigrain shared memory system, Proceedings of the 23rd annual international symposium on Computer architecture, p.44-55, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
CITED BY 5
|
|
|
|
|
|
|
|
Håkan Zeffer , Zoran Radović , Martin Karlsson , Erik Hagersten, TMA: a trap-based memory architecture, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
Tao Liu , Haibo Lin , Tong Chen , John Kevin O'Brien , Ling Shao, DBDB: optimizing DMATransfer for the cell be architecture, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|