|
ABSTRACT
Cashmere is a software distributed shared memory (S-DSM) system designed for clusters of server-class machines. It is distinguished from most other S-DSM projects by (1) the effective use of fast user-level messaging, as provided by modern system-area networks, and (2) a “two-level” protocol structure that exploits hardware coherence within multiprocessor nodes. Fast user-level messages change the tradeoffs in coherence protocol design; they allow Cashmere to employ a relatively simple directory-based coherence protocol. Exploiting hardware coherence within SMP nodes improves overall performance when care is taken to avoid interference with inter-node software coherence.We have implemented Cashmere on a Compaq AlphaServer/Memory Channel cluster, an architecture that provides fast user-level messages. Experiments indicate that a one-level, version of the Cashmere protocol provides performance comparable to, or slightly better than, that of TreadMarks' lazy release consistency. Comparisons to Compaq's Shasta protocol also suggest that while fast user-level messages make finer-grain software DSMs competitive, VM-based systems continue to outperform software-based access control for applications without extensive fine-grain sharing.Within the family of Cashmere protocols, we find that leveraging intranode hardware coherence provides a 37% performance advantage over a more straightforward one-level implementation. Moreover, contrary to our original expectations, noncoherent hardware support for remote memory writes, total message ordering, and broadcast, provide comparatively little in the way of additional benefits over just fast messaging for our application suite.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Anant Agarwal , Ricardo Bianchini , David Chaiken , Kirk L. Johnson , David Kranz , John Kubiatowicz , Beng-Hong Lim , Kenneth Mackenzie , Donald Yeung, The MIT Alewife machine: architecture and performance, Proceedings of the 22nd annual international symposium on Computer architecture, p.2-13, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
3
|
American National Standards Institute. 1996. Information Systems---High-Performance Parallel Interface---Mechanical, Electrical, and Signalling Protocol Specification (HIPPI-PH). ANSI X3.183-1991 (R1996), New York, NY.
|
| |
4
|
Cristiana Amza , Alan L. Cox , Sandhya Dwarkadas , Pete Keleher , Honghui Lu , Ramakrishnan Rajamony , Weimin Yu , Willy Zwaenepoel, TreadMarks: Shared Memory Computing on Networks of Workstations, Computer, v.29 n.2, p.18-28, February 1996
[doi> 10.1109/2.485843]
|
| |
5
|
|
 |
6
|
John K. Bennett , John B. Carter , Willy Zwaenepoel, Adaptive software cache management for distributed shared memory architectures, Proceedings of the 17th annual international symposium on Computer Architecture, p.125-134, May 28-31, 1990, Seattle, Washington, United States
|
| |
7
|
Bilas, A., Iftode, L., Martin, D., and Singh, J. P. 1996. Shared Virtual Memory Across SMP Nodes Using Automatic Update: Protocols and Performance. Tech. Rep. TR-517-96, Dept. of Computer Science, Princeton Univ., Oct.
|
 |
8
|
Angelos Bilas , Cheng Liao , Jaswinder Pal Singh, Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems, Proceedings of the 26th annual international symposium on Computer architecture, p.282-293, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
9
|
M. A. Blumrich , K. Li , R. Alpert , C. Dubnicki , E. W. Felten , J. Sandberg, Virtual memory mapped network interface for the SHRIMP multicomputer, Proceedings of the 21ST annual international symposium on Computer architecture, p.142-153, April 18-21, 1994, Chicago, Illinois, United States
|
 |
10
|
|
 |
11
|
William J. Bolosky , Michael L. Scott , Robert P. Fitzgerald , Robert J. Fowler , Alan L. Cox, NUMA policies and their relation to memory architecture, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.212-221, April 08-11, 1991, Santa Clara, California, United States
|
| |
12
|
Bolosky, W. J. and Scott, M. L. 1992. Evaluation of multiprocessor memory systems using off-line optimal behavior. J. Para. Distrib. Comput. 15, 4 (Aug.), 382--398.
|
 |
13
|
Greg Buzzard , David Jacobson , Milon Mackey , Scott Marovich , John Wilkes, An implementation of the Hamlyn sender-managed interface architecture, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.245-259, October 29-November 01, 1996, Seattle, Washington, United States
|
 |
14
|
John B. Carter , John K. Bennett , Willy Zwaenepoel, Implementation and performance of Munin, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.152-164, October 13-16, 1991, Pacific Grove, California, United States
|
 |
15
|
|
| |
16
|
Compaq, Intel, and Microsoft. 1997. Virtual Interface Architecture Specification. Draft Revision 1.0, Dec. Available at ftp://download.intel.com/design/servers/vi/san_10.pdf.
|
 |
17
|
|
 |
18
|
A. L. Cox , S. Dwarkadas , P. Keleher , H. Lu , R. Rajamony , W. Zwaenepoel, Software versus hardware shared-memory implementation: a case study, Proceedings of the 21ST annual international symposium on Computer architecture, p.106-117, April 18-21, 1994, Chicago, Illinois, United States
|
 |
19
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
20
|
Dave Dunning , Greg Regnier , Gary McAlpine , Don Cameron , Bill Shubert , Frank Berry , Anne Marie Merritt , Ed Gronke , Chris Dodd, The Virtual Interface Architecture, IEEE Micro, v.18 n.2, p.66-76, March 1998
[doi> 10.1109/40.671404]
|
| |
21
|
Dwarkadas, S., Schäffer, A. A., Cottingham Jr., R. W., Cox, A. L., Keleher, P., and Zwaenepoel, W. 1994. Parallelization of General Linkage Analysis Problems. Human Heredity 44, 127--141.
|
 |
22
|
Sandhya Dwarkadas , Alan L. Cox , Willy Zwaenepoel, An integrated compile-time/run-time software distributed shared memory system, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.186-197, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
23
|
|
| |
24
|
Sandhya Dwarkadas , Kourosh Gharachorloo , Leonidas Kontothanassis , Daniel J. Scales , Michael L. Scott , Robert Stets, Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.260, January 09-12, 1999
|
 |
25
|
Andrew Erlichson , Neal Nuckolls , Greg Chesson , John Hennessy, SoftFLASH: analyzing the performance of clustered distributed virtual shared memory, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.210-220, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
26
|
Feeley, M. J., Chase, J. S., Narasayya, V. R., and Levy, H. M. 1994. Integrating coherency and recovery in distributed systems. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, Monterey, CA, Nov.
|
| |
27
|
|
| |
28
|
Gillett, R. 1996. Memory channel: An optimized cluster interconnect. IEEE Micro 16, 2 (Feb.), 12--18.
|
 |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
InfiniBand Trade Association. 2002. InfiniBand Architecture Specification. Release 1.1, Nov. Available at www.infinibandta.org/specs.
|
 |
33
|
K. L. Johnson , M. F. Kaashoek , D. A. Wallach, CRL: high-performance all-software distributed shared memory, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.213-226, December 03-06, 1995, Copper Mountain, Colorado, United States
|
| |
34
|
|
 |
35
|
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
 |
39
|
Leonidas Kontothanassis , Galen Hunt , Robert Stets , Nikolaos Hardavellas , Michał Cierniak , Srinivasan Parthasarathy , Wagner Meira, Jr. , Sandhya Dwarkadas , Michael Scott, VM-based shared memory on low-latency, remote-memory-access networks, Proceedings of the 24th annual international symposium on Computer architecture, p.157-169, June 01-04, 1997, Denver, Colorado, United States
|
 |
40
|
|
 |
41
|
|
| |
42
|
Li, K. and Schaefer, R. 1989. A hypercube shared virtual memory system. In Proceedings of the 1989 International Conference on Parallel Processing, St. Charles, IL, Aug. Penn. State Univ. Press.
|
 |
43
|
|
| |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
|
| |
48
|
Petersen, K. and Li, K. 1993. Cache coherence for shared memory multiprocessors based on virtual memory support. In Proceedings of the 7th International Parallel Processing Symposium, Newport Beach, CA, Apr.
|
| |
49
|
|
 |
50
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21ST annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
| |
51
|
|
 |
52
|
Harjinder S. Sandhu , Benjamin Gamsa , Songnian Zhou, The shared regions approach to software cache coherence on multiprocessors, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.229-238, May 19-22, 1993, San Diego, California, United States
|
 |
53
|
Daniel J. Scales , Kourosh Gharachorloo , Chandramohan A. Thekkath, Shasta: a low overhead, software-only approach for supporting fine-grain shared memory, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.174-185, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
54
|
|
| |
55
|
|
| |
56
|
|
 |
57
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.297-306, October 05-07, 1994, San Jose, California, United States
|
 |
58
|
|
| |
59
|
|
| |
60
|
Stets, R., Dwarkadas, S., Kontothanassis, L. I., Rencuzogullari, U., and Scott, M. L. 2000. The Effect of Network Total Order, Broadcast, and Remote-Write Capability on Network-Based Shared Memory Computing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture, Toulouse, France, Jan.
|
 |
61
|
Robert Stets , Sandhya Dwarkadas , Nikolaos Hardavellas , Galen Hunt , Leonidas Kontothanassis , Srinivasan Parthasarathy , Michael Scott, Cashmere-2L: software coherent shared memory on a clustered remote-write network, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.170-183, October 05-08, 1997, Saint Malo, France
|
| |
62
|
|
| |
63
|
Top 500 Supercomputer Sites. 2003. Univ. of Manheim, Univ. of Tennessee, and NERSC/LBNL, June. http://www.top500.org/lists/2003/06/.
|
 |
64
|
Ben Verghese , Scott Devine , Anoop Gupta , Mendel Rosenblum, Operating system support for improving data locality on CC-NUMA compute servers, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.279-289, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
65
|
T. von Eicken , A. Basu , V. Buch , W. Vogels, U-Net: a user-level network interface for parallel and distributed computing (includes URL), Proceedings of the fifteenth ACM symposium on Operating systems principles, p.40-53, December 03-06, 1995, Copper Mountain, Colorado, United States
|
| |
66
|
|
 |
67
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
68
|
Donald Yeung , John Kubiatowicz , Anant Agarwal, MGS: a multigrain shared memory system, Proceedings of the 23rd annual international symposium on Computer architecture, p.44-55, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
69
|
Zekauskas, M. J., Sawdon, W. A., and Bershad, B. N. 1994. Software write detection for distributed shared memory. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, Monterey, CA, Nov.
|
 |
70
|
Yuanyuan Zhou , Liviu Iftode , Jaswinder Pal Sing , Kai Li , Brian R. Toonen , Ioannis Schoinas , Mark D. Hill , David A. Wood, Relaxed consistency and coherence granularity in DSM systems: a performance evaluation, Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.193-205, June 18-21, 1997, Las Vegas, Nevada, United States
|
CITED BY 3
|
|
|
|
|
Bratin Saha , Xiaocheng Zhou , Hu Chen , Ying Gao , Shoumeng Yan , Mohan Rajagopalan , Jesse Fang , Peinan Zhang , Ronny Ronen , Avi Mendelson, Programming model for a heterogeneous x86 platform, ACM SIGPLAN Notices, v.44 n.6, June 2009
|
|
|
|
|