ACM Home Page
Please provide us with feedback. Feedback
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems
Full text PdfPdf (441 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 26th annual international symposium on Computer architecture table of contents
Atlanta, Georgia, United States
Pages: 282 - 293  
Year of Publication: 1999
ISBN:0-7695-0170-2
Also published in ...
Authors
Angelos Bilas  Dept. of Elec. and Comp. Eng., 10 King's College Road, University of Toronto, Toronto, ON M5S 3G4, Canada
Cheng Liao  Dept. of Computer Science, 35 Olden Street, Princeton University, Princeton, NJ
Jaswinder Pal Singh  Dept. of Computer Science, 35 Olden Street, Princeton University, Princeton, NJ
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 39,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/300979.301003
What is a DOI?

ABSTRACT

The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardware-coherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity.This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechanisms are simple and do not require programmability.We find that the performance improvements are substantial, bringing performance on a small-scale SMP cluster much closer to that of hardware-coherent shared memory for many applications, and we show the value of each of the mechanisms in different applications. Application performance improves by about 37% on average for reasonably well performing applications, even on our relatively slow programmable NI, and more for others. We discuss the key remaining bottlenecks at the protocol level and use a firmware performance monitor in the NI to understand the interactions with and the implications for the communication layer.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
J. E. Barnes and P. Hut. A hierarchical O(N log N) force calculation algorithm. Nature, 324(4):446--449, 1986.
4
5
6
 
7
A. Bilas, L. Iftode, R. Samanta, and J. P. Singh. Supporting a coherent shared address space across SMP nodes: An application-driven investigation. In IMA Workshop on Parallel Algorithms and Parallel System.s, Nov. 1996.
8
 
9
10
11
12
 
13
 
14
A. Brandt. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation, 31(138):333-390, April 1977.
 
15
C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis, and K. Li. VMMO-2: efficient support for reliable, connection-oriented communication. In Proceedings of Hot Interconnects, Aug. 1997.
 
16
D. Dunning and G. Regnier. The Virtual Interface Architecture. In Proceedings of Hot Interconnects V Symposium, Stanford, Aug. 1997.
17
18
 
19
 
20
 
21
N. Hardavellas, G. C. Hunt, S. Ioannidis, R. Stets, S. Dwarkadas, L. Kontothanassis, and M. L. Scott. Efficient use of memory-mapped network interfaces for shared memory computing. Newsletter of the IEEE CS Technical Committee on Computer Architecture, pages 28-33, Mar. t997.
 
22
L. Hernquist. Hierarchical N-body methods. Computer Physics Communications, 48:107-115, 1988.
 
23
C. Holt, J. P. Singh, and J. Hennessy. Architectural and application bottlenecks in scalable DSM multiprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
 
24
R. W. Horst and D. Garcia. ServerNet SAN I/O Architecture. In Proceedings of Hot Interconnects V Symposium, Stanford, Aug. 1997.
 
25
L. Iftode, M. BIumrich, C. Dubnicki, D. Oppenheimer, J. P. Singh, and K. Li. Implementation and performance of shared virtual memory protocols on shrimp. In Seventh Workshop on Scalable Shared Memory Muftiprocessors (in conjunction with the 25th Annual International Symposium on Computer Architecture), June 1998.
 
26
27
28
29
 
30
 
31
 
32
P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel. qYeadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the Winter USENIX Conference, pages 115-132, Jan. I994.
33
 
34
35
36
37
38
 
39
L. Prylli and B. 'Iburancheau. BIP: a new protocol designed for high performance. In In PC-NOW Workshop, held in parallel with IPPS/SPDP98, Orlando, USA, March 30 - April 3 1998.
 
40
 
41
 
42
I. Schoinas, B. Falsafi, M. D. Hill, J. R. Larus, C. E. Lucas, S. S. Mukherjee, S. K. Reinhardt, E. Schnarr, and D. A. Wood. Implementing fine-grain distributed shared memory on commodity stop workstations. Technical Report 1307, University of Wisconsin-Madison, Mar. 1996.
 
43
J. P. Singh, A. Gupta, and J. L. Hennessy. Implications of hierarchical N-body techniques for multiprocessor architecture. A GM Transactions on Computer Systerr~, May 1995. To appear. Early version available as Stanford Univeristy Tech. Report no. CSL-TR-92-506~ January 1992.
 
44
 
45
46
 
47
H. Tezuka. A. Hori, and Y. Ishikawa. PM: a highperformance communication library for multi-user parallel environments. Technical Report TR-96015, Real World Computing Partnership, 1996.
 
48
S. Woo, M. Ohara, E. 'Ibrrie, J. P. Singh, and A. Gupta. Methodological considerations and characterization of the SPLASH-2 parallel application suite. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May I995.
49
50
51
52

CITED BY  15

Collaborative Colleagues:
Angelos Bilas: colleagues
Cheng Liao: colleagues
Jaswinder Pal Singh: colleagues