|
ABSTRACT
Recently there has been a lot of effort in providing cost-effective Shared Memory systems by employing software only solutions on clusters of high-end workstations coupled with high-bandwidth, low-latency commodity networks. Much of the work so far has focused on improving protocols, and there has been some work on restructuring applications to perform better on SVM systems. The result of this progress has been the promise for good performance on a range of applications at least in the 16-32 processor range. New system area networks and network interfaces provide significantly lower overhead, lower latency and higher bandwidth communication in clusters, inexpensive SMPs have become common as the nodes of these clusters, and SVM protocols are now quite mature. With this progress, it is now useful to examine what are the important system bottlenecks that stand in the way of effective parallel performance; in particular, which parameters of the communication architecture are most important to improve further relative to processor speed, which ones are already adequate on modern systems for most applications, and how will this change with technology in the future. Such information can assist system designers in determining where to focus their energies in improving performance, and users in determining what system characteristics are appropriate for their applications.We find that the most important system cost to improve is the overhead of generating and delivering interrupts. Improving network interface (and I/O bus) bandwidth relative to processor speed helps some bandwidth-bound applications, but currently available ratios of bandwidth to processor speed are already adequate for many others. Surprisingly, neither the processor overhead for handling messages nor the occupancy of the communication interface in preparing and pushing packets through the network appear to require much improvement.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Bilas, L. Iftode, and J. P. Singh. Comparison of shared virtual memory across uniprocessor and SMP nodes. In IMA Workshop on Parallel Algorithms and Parallel Systems, Nov. 1996.
|
 |
2
|
M. A. Blumrich , K. Li , R. Alpert , C. Dubnicki , E. W. Felten , J. Sandberg, Virtual memory mapped network interface for the SHRIMP multicomputer, Proceedings of the 21ST annual international symposium on Computer architecture, p.142-153, April 18-21, 1994, Chicago, Illinois, United States
|
| |
3
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015]
|
| |
4
|
|
 |
5
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
 |
6
|
Andrew Erlichson , Basem A. Nayfeh , Jaswinder P. Singh , Kunle Olukotun, The benefits of clustering in shared address space multiprocessors: an applications-driven investigation, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.60-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224397]
|
| |
7
|
Chris Holt , Mark Heinrich , Jaswinder P Singh , Edward Rothberg , John Hennessy, The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors, Stanford University, Stanford, CA, 1995
|
| |
8
|
|
 |
9
|
Liviu Iftode , Jaswinder Pal Singh , Kai Li, Understanding application performance on shared virtual memory systems, Proceedings of the 23rd annual international symposium on Computer architecture, p.122-133, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
10
|
Dongming Jiang , Hongzhang Shan , Jaswinder Pal Singh, Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors, Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.217-229, June 18-21, 1997, Las Vegas, Nevada, United States
|
| |
11
|
|
| |
12
|
P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel. Treadmarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the Winter USENIX Conference, pages 115-132, Jan. 1994.
|
 |
13
|
Leonidas I. Kontothanassis , Michael L. Scott , Ricardo Bianchini, Lazy release consistency for hardware-coherent multiprocessors, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.61-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224398]
|
 |
14
|
Leonidas Kontothanassis , Galen Hunt , Robert Stets , Nikolaos Hardavellas , Michał Cierniak , Srinivasan Parthasarathy , Wagner Meira, Jr. , Sandhya Dwarkadas , Michael Scott, VM-based shared memory on low-latency, remote-memory-access networks, Proceedings of the 24th annual international symposium on Computer architecture, p.157-169, June 01-04, 1997, Denver, Colorado, United States
|
| |
15
|
Richard Martin , Amin Vahdat , David Culler , Thomas Anderson, Effect of Communication Latency, Overhead, and Bandwidth on a Cluster, University of California at Berkeley, Berkeley, CA, 1998
|
| |
16
|
S. Pakin, M. Buchanan, M. Lauria, and A. Chien. The Fast Messages (FM) 2.0 streaming interface. Submitted to Usenix'97, 1996.
|
 |
17
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21ST annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
| |
18
|
A. Sharma, A. T. Nguyen, J. Torellas, M. Michael, and J. Carbajal. Augmint: a multiprocessor simulation environment for intel x86 architectures. Technical report, University of Illinois at Urbana-Champaign, March 1996.
|
| |
19
|
|
 |
20
|
Robert Stets , Sandhya Dwarkadas , Nikolaos Hardavellas , Galen Hunt , Leonidas Kontothanassis , Srinivasan Parthasarathy , Michael Scott, Cashmere-2L: software coherent shared memory on a clustered remote-write network, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.170-183, October 05-08, 1997, Saint Malo, France
|
| |
21
|
|
| |
22
|
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. Methodological considerations and characterization of the SPLASH-2 parallel application suite. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1995.
|
 |
23
|
Donald Yeung , John Kubiatowicz , Anant Agarwal, MGS: a multigrain shared memory system, Proceedings of the 23rd annual international symposium on Computer architecture, p.44-55, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
24
|
M. D. H. Y. Zhou, I. S. L. Iftode, B. R. T. K. Li, J. P. Singh, and D. A. Wood. Relaxed consistency and coherence granularity in DSM systems: A performance evaluation. Technical Report TR-535-96, Department of Computer Science, Princeton University, December 1996, 10 Pages.
|
 |
25
|
Yuanyuan Zhou , Liviu Iftode , Kai Li, Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.75-88, October 29-November 01, 1996, Seattle, Washington, United States
|
CITED BY 10
|
|
|
|
|
Cheng Liao , Dongming Jiang , Liviu Iftode , Margaret Martonosi , Douglas W. Clark, Monitoring shared virtual memory performance on a Myrinet-based PC cluster, Proceedings of the 12th international conference on Supercomputing, p.251-258, July 1998, Melbourne, Australia
|
|
|
|
|
|
Soichiro Araki , Angelos Bilas , Cezary Dubnicki , Jan Edler , Koichi Konishi , James Philbin, User-space communication: a quantitative study, Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), p.1-16, November 07-13, 1998, San Jose, CA
|
|
|
|
|
|
|
|
|
|
|
|
Mainak Chaudhuri , Mark Heinrich , Chris Holt , Jaswinder Pal Singh , Edward Rothberg , John Hennessy, Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation, IEEE Transactions on Computers, v.52 n.7, p.862-880, July 2003
|
|
|
|
|
|
|
|