ACM Home Page
Please provide us with feedback. Feedback
Accelerating shared virtual memory via general-purpose network interface support
Full text PdfPdf (179 KB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 19 ,  Issue 1  (February 2001) table of contents
Pages: 1 - 35  
Year of Publication: 2001
ISSN:0734-2071
Authors
Angelos Bilas  Univ. of Toronto, Toronto, Ont., Canada
Dongming Jiang  Princeton Univ., Princeton, NJ
Jaswinder Pal Singh  Princeton Univ., Princeton, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 59,   Citation Count: 0
Additional Information:

abstract   references   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/367742.367747
What is a DOI?

ABSTRACT

Clusters of symmetric multiprocessors (SMPs) are important platforms for high-performance computing. With the success of hardware cache-coherent distributed shared memory (DSM), a lot of effort has also been made to support the coherent shared-address-space programming model in software on clusters. Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the performance of software virtual memory (SVM) is still far from that achieved on hardware DSM systems. The goal of this paper is to improve the performance of SVM on system area network clusters by considering communication and protocol layer interactions. We first examine what are the important communication system bottlenecks that stand in the way of improving parallel performance of SVM clusters; in particular, which parameters of the communication architecture are most important to improve further relative to processor speed, which ones are already adequate on modern systems for most applications, and how will this change with technology in the future. We find that the most important communication subsystem cost to improve is the overhead of generating and delivery interrupts for asynchronous protocol processing. Then we proceed to show, that by providing simple and general support for asynchronous message handling in a commodity network interface (NI) and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling, and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support for shared Memory Abstractions), on a cluster of SMPs with a programmable NI. We find that the performance improvements are substantial, bringing performance on a small-scale SMP cluster much closer to that of hardware-coherent shared memory for many applications, and we show the value of each of the mechanisms in different applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
BARNES,J.AND HUT, P. 1986. A hierarchical O(NlogN) force calculation algorithm. Nature 324, 4, 446-449.
4
5
6
 
7
BILAS, A., IFTODE, L., AND SINGH, J. P. 1999a. Supporting a coherent shared address space across SMP nodes: An application-driven investigation. In Algorithms for Parallel Processing, M. Heath, A. Ranade, and R. Schreiber, Eds. IMA Volumes in Mathematics and Its Applications, vol. 105. Springer-Verlag, Vienna, Austria, 19-59.
 
8
 
9
BILAS, A., LIAO, C., AND SINGH, J. P. 1999c. Accelerating shared virtual memory using commodity ni support to avoid asynchronous message handling. In Proceedings of the 26th Annual International Symposium on Computer Architecture (June).
10
11
 
12
 
13
BRANDT, A. 1977. Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31, 138 (Apr.), 333-390.
 
14
DUBNICKI, C., BILAS, A., CHEN, Y., DAMIANAKIS, S., AND LI, K. 1997. VMMC-2: Efficient support for reliable, connection-oriented communication. In Proceedings of the Symposium on Hot Interconnects V (Stamford, CT, Aug.).
 
15
DUNNING,D.AND REGNIER, G. 1997. The virtual interface architecture. In Proceedings of the Symposium on Hot Interconnects V (Stamford, CT, Aug.).
16
17
 
18
 
19
GILLETT, R., COLLINS, M., AND PIMM, D. 1996. Overview of network memory channel for PCI. In Proceedings on COMPCON (February).
 
20
HARDAVELLAS, N., HUNT,G.C.,IONNIDIS, S., STETS, R., DWARKADAS, S., KONTOTHANASSIS, L., AND SCOTT, M. L. 1997. Efficient use of memory-mapped network interfaces for shared memory computing. In Newsletter of the IEEE CS Technical Committee on Computer Architecture (Mar.). 28-33.
 
21
HERNQUIST, L. 1988. Hierarchical N-body methods. Comput. Phys. Commun. 48, 107-115.
 
22
HOLT, C., HEINRICH, M., SINGH,J.P.,SINGH, A., AND HENNESSY, J. L. 1995. The effects of latency and occupancy on the performance of dsm multiprocessors. Stanford University, Stanford, CA.
23
 
24
HORST,R.W.AND GARCIA, D. 1997. ServerNet SAN I/O architecture. In Proceedings of the Symposium on Hot Interconnects V (Stamford, CT, Aug.).
 
25
26
27
28
 
29
 
30
 
31
KELEHER, P., DWARKADAS, S., COX, A., AND ZWAENEPOEL, W. 1994. Treadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the Winter Conference on USENIX (Jan.). USENIX Assoc., Berkeley, CA, 115-131.
 
32
33
34
35
36
 
37
38
 
39
PAKIN, S., BUCHANAN, M., LAURIA, M., AND CHIEN, A. 1997. Fast Messages (FM) 2.0 streaming interface. In Proceedings of the 1997 USENIX Annual Technical Conference (Anaheim, CA, Jan.). USENIX Assoc., Berkeley, CA.
40
 
41
 
42
 
43
SCHOINAS, I., FALSAFI, B., HILL,M.D.,LARUS,J.R.,LUCAS,C.E.,MUKHERJEE,S.S.,REINHARDT, S. K., SCHNARR, E., AND WOOD, D. A. 1996. Implementing fine-grain distributed shared memory on commodity smp workstations. 1307.
 
44
SHARMA, A., NGUYEN,A.T.,TORELLAS, J., MICHAEL, M., AND CARBAJAL, J. 1996. Augmint: A multiprocessor simulation environment for Intel x86 architectures.
 
45
 
46
47
 
48
 
49
STETS, R., DWARKADAS, S., KONTOTHANASSIS, L., RENCUZOGULLARI, U., AND SCOTT, M. L. 2000. The effect of network toral order, broadcast, and remote-write capability on network-based shared memory computing. In Proceedings of the 6th IEEE Symposium on High-Performance Computer Architecture (Jan.).
 
50
 
51
WOO,S.C.,OHARA, M., TORRIE, E., SINGH,J.P.,AND GUPTA, A. 1996. Methodological considerations and characterization of the SPLASH-2 parallel application suite. In Proceedings of the 23rd International Symposium on Computer Architecture (ISCA '96, Philadelphia, PA, May 22-24), J.-L. Baer, Chair. ACM Press, New York, NY.
52
53


REVIEW

"Veronica Lagrange : Reviewer"

This paper describes research to improve performance of software virtual memory (SVM) for clustered environments by targeting potential bottlenecks on the communication and protocol layer interactions. Host overhead, I/O bus bandw  more...

Collaborative Colleagues:
Angelos Bilas: colleagues
Dongming Jiang: colleagues
Jaswinder Pal Singh: colleagues