ACM Home Page
Please provide us with feedback. Feedback
Shared memory computing on clusters with symmetric multiprocessors and system area networks
Full text PdfPdf (918 KB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 23 ,  Issue 3  (August 2005) table of contents
Pages: 301 - 335  
Year of Publication: 2005
ISSN:0734-2071
Authors
Leonidas Kontothanassis  HP Labs, Cambridge, MA
Robert Stets  Google, Inc., Mountain View, CA
Galen Hunt  Microsoft Research, Redmond, WA
Umit Rencuzogullari  VMware, Inc., Palo Alto, CA
Gautam Altekar  University of California, Berkeley, Berkeley, CA
Sandhya Dwarkadas  University of Rochester, Rochester, NY
Michael L. Scott  University of Rochester, Rochester, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 201,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1082469.1082472
What is a DOI?

ABSTRACT

Cashmere is a software distributed shared memory (S-DSM) system designed for clusters of server-class machines. It is distinguished from most other S-DSM projects by (1) the effective use of fast user-level messaging, as provided by modern system-area networks, and (2) a “two-level” protocol structure that exploits hardware coherence within multiprocessor nodes. Fast user-level messages change the tradeoffs in coherence protocol design; they allow Cashmere to employ a relatively simple directory-based coherence protocol. Exploiting hardware coherence within SMP nodes improves overall performance when care is taken to avoid interference with inter-node software coherence.We have implemented Cashmere on a Compaq AlphaServer/Memory Channel cluster, an architecture that provides fast user-level messages. Experiments indicate that a one-level, version of the Cashmere protocol provides performance comparable to, or slightly better than, that of TreadMarks' lazy release consistency. Comparisons to Compaq's Shasta protocol also suggest that while fast user-level messages make finer-grain software DSMs competitive, VM-based systems continue to outperform software-based access control for applications without extensive fine-grain sharing.Within the family of Cashmere protocols, we find that leveraging intranode hardware coherence provides a 37% performance advantage over a more straightforward one-level implementation. Moreover, contrary to our original expectations, noncoherent hardware support for remote memory writes, total message ordering, and broadcast, provide comparatively little in the way of additional benefits over just fast messaging for our application suite.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
American National Standards Institute. 1996. Information Systems---High-Performance Parallel Interface---Mechanical, Electrical, and Signalling Protocol Specification (HIPPI-PH). ANSI X3.183-1991 (R1996), New York, NY.
 
4
 
5
6
 
7
Bilas, A., Iftode, L., Martin, D., and Singh, J. P. 1996. Shared Virtual Memory Across SMP Nodes Using Automatic Update: Protocols and Performance. Tech. Rep. TR-517-96, Dept. of Computer Science, Princeton Univ., Oct.
8
9
10
11
 
12
Bolosky, W. J. and Scott, M. L. 1992. Evaluation of multiprocessor memory systems using off-line optimal behavior. J. Para. Distrib. Comput. 15, 4 (Aug.), 382--398.
13
14
15
 
16
Compaq, Intel, and Microsoft. 1997. Virtual Interface Architecture Specification. Draft Revision 1.0, Dec. Available at ftp://download.intel.com/design/servers/vi/san_10.pdf.
17
18
19
 
20
 
21
Dwarkadas, S., Schäffer, A. A., Cottingham Jr., R. W., Cox, A. L., Keleher, P., and Zwaenepoel, W. 1994. Parallelization of General Linkage Analysis Problems. Human Heredity 44, 127--141.
22
 
23
 
24
25
 
26
Feeley, M. J., Chase, J. S., Narasayya, V. R., and Levy, H. M. 1994. Integrating coherency and recovery in distributed systems. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, Monterey, CA, Nov.
 
27
 
28
Gillett, R. 1996. Memory channel: An optimized cluster interconnect. IEEE Micro 16, 2 (Feb.), 12--18.
29
30
 
31
 
32
InfiniBand Trade Association. 2002. InfiniBand Architecture Specification. Release 1.1, Nov. Available at www.infinibandta.org/specs.
33
 
34
35
 
36
 
37
 
38
39
40
41
 
42
Li, K. and Schaefer, R. 1989. A hypercube shared virtual memory system. In Proceedings of the 1989 International Conference on Parallel Processing, St. Charles, IL, Aug. Penn. State Univ. Press.
43
 
44
 
45
 
46
 
47
 
48
Petersen, K. and Li, K. 1993. Cache coherence for shared memory multiprocessors based on virtual memory support. In Proceedings of the 7th International Parallel Processing Symposium, Newport Beach, CA, Apr.
 
49
50
 
51
52
53
54
 
55
 
56
57
58
 
59
 
60
Stets, R., Dwarkadas, S., Kontothanassis, L. I., Rencuzogullari, U., and Scott, M. L. 2000. The Effect of Network Total Order, Broadcast, and Remote-Write Capability on Network-Based Shared Memory Computing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture, Toulouse, France, Jan.
61
 
62
 
63
Top 500 Supercomputer Sites. 2003. Univ. of Manheim, Univ. of Tennessee, and NERSC/LBNL, June. http://www.top500.org/lists/2003/06/.
64
65
 
66
67
68
 
69
Zekauskas, M. J., Sawdon, W. A., and Bershad, B. N. 1994. Software write detection for distributed shared memory. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, Monterey, CA, Nov.
70


Collaborative Colleagues:
Leonidas Kontothanassis: colleagues
Robert Stets: colleagues
Galen Hunt: colleagues
Umit Rencuzogullari: colleagues
Gautam Altekar: colleagues
Sandhya Dwarkadas: colleagues
Michael L. Scott: colleagues