|
ABSTRACT
Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the multiple levels of interconnection. In this paper, we present the first multi-protocol implementation of a lightweight message layer---a version of Active Messages-II running on a cluster of Sun Enterprise 5000 servers connected with Myrinet. This research brings together several pieces of high-performance interconnection technology: bus backplanes for symmetric multiprocessors, low-latency networks for connections between machines, and simple, user-level primitives for communication. The paper describes the shared memory message-passing protocol and analyzes the multi-protocol implementation with both microbenchmarks and Split-C applications. Three aspects of the communication layer are critical to performance: the overhead of cache-coherence mechanisms, the method of managing concurrent access, and the cost of accessing state with the slower protocol. Through the use of an adaptive polling strategy, the multi-protocol implementation limits performance interactions between the protocols, delivering up to 160 MB/s of bandwidth with 3.6 microsecond end-to-end latency. Applications within an SMP benefit from this fast communication, running up to 75% faster than on a network of uniprocessor workstations. Applications running on the entire Clump are limited by the balance of NIC's to processors in our system, and are typically slower than on the NOW. These results illustrate several potential pitfalls for the Clumps architecture.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Albert Alexandrov , Mihai F. Ionescu , Klaus E. Schauser , Chris Scheiman, LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.95-105, June 24-26, 1995, Santa Barbara, California, United States
[doi> 10.1145/215399.215427]
|
| |
2
|
Accelerated Strategic Computing Initiative, a program of the Department of Energy. Information is available via http://www.llnl.gov/asci-alliances/.
|
| |
3
|
D. A. Bader, J. JáJá, "SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMP's)," preliminary version, May 1997, available via http://www.umiacs.umd.edu/research/EXPAR.
|
| |
4
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015]
|
| |
5
|
|
| |
6
|
R. Butler, E. Lusk, "Monitors, Message, and Clusters: the p4 Parallel Programming System," available via http://www.mcs.anl.gov/home/lusk/p4/p4-paper/paper.html.
|
| |
7
|
B. N. Chun, A. M. Mainwaring, D. E. Culler, "A General-Purpose Protocol Architecture for a Low-Latency, Multi-gigabit System Area Network," Proceedings of Hot Interconnects V, Stanford, California, August 1997.
|
 |
8
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
 |
9
|
David Culler , Richard Karp , David Patterson , Abhijit Sahay , Klaus Erik Schauser , Eunice Santos , Ramesh Subramonian , Thorsten von Eicken, LogP: towards a realistic model of parallel computation, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.1-12, May 19-22, 1993, San Diego, California, United States
|
| |
10
|
|
| |
11
|
S. J. Fink, S. B. Baden, "Non-Uniform Partitioning of Finite Difference Methods Running on SMP Clusters," submitted for publication, available via http://www-cse.ucsd.edu/users/baden/MT.html.
|
| |
12
|
S. J. Fink, S. B. Baden, "Runtime Support for Multi-Tier Programming of Block-Structured Applications on SMP Clusters," submitted for publication, available via http://www-cse.ucsd.edu/users/baden/MT.html.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
Dongming Jiang , Hongzhang Shan , Jaswinder Pal Singh, Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors, Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.217-229, June 18-21, 1997, Las Vegas, Nevada, United States
|
| |
18
|
B.-H. Lim, P. Heidelberger, P. Pattnaik, M. Snir, "Message Proxies for Efficient, Protected Communication on SMP Clusters," IBM Almaden Research Report #RC 20522 (90972), August 1996.
|
| |
19
|
L. T. Liu, D. E. Culler, "Evaluation of the Intel Paragon on Active Message Communication," Proceedings of Intel Supercomputer Users Group Conference, June 1995, also available via http://now.CS.Berkeley.EDU.
|
| |
20
|
S. S. Lumetta, D. E. Culler, "Managing Concurrent Access for Shared Memory Active Messages," U. C. Berkeley Technical Report in preparation.
|
| |
21
|
|
| |
22
|
R. Martin, "HPAM: an Active Message Layer for a Network of HP Workstations," Proceedings of Hot Interconnects II, Stanford, California, August 1994, pp. 40-58.
|
 |
23
|
|
| |
24
|
S. S. Mukherjee, M. D. Hill, "A Case for Making Network Interfaces Less Peripheral," Proceedings of Hot Interconnects V, Stanford, California, August 1997.
|
 |
25
|
|
| |
26
|
|
| |
27
|
A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yuan, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvey, E. Hagersten, B. Liencres, "Gigaplane: A High Performance Bus for Large SMPs," Proceedings of Hot Interconnects IV, Stanford, California, August 1996, pp. 41-52
|
| |
28
|
|
| |
29
|
T. von Eicken, V. Avula, A. Basu, V. Buch, "Low-latency Communication over ATM Networks Using Active Messages," Proceedings of Hot Interconnects II, Stanford, California, August 1994, pp. 60-71.
|
 |
30
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
| |
31
|
|
 |
32
|
Donald Yeung , John Kubiatowicz , Anant Agarwal, MGS: a multigrain shared memory system, Proceedings of the 23rd annual international symposium on Computer architecture, p.44-55, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
CITED BY 19
|
|
|
|
|
Steve Sistare , Rolf vandeVaart , Eugene Loh, Optimization of MPI collectives on clusters of large-scale SMP's, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.23-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
Patrick Geoffray , Loïc Prylli , Bernard Tourancheau, BIP-SMP: high performance message passing over a cluster of commodity SMPs, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.20-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
Girindra D. Sharma , Radharamanan Radhakrishnan , Umesh Kumar V. Rajasekaran , Nael Abu-Ghazaleh , Philip A. Wilsey, Time Warp simulation on clumps, Proceedings of the thirteenth workshop on Parallel and distributed simulation, p.174-181, May 01-04, 1999, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|