|
ABSTRACT
Communication has a dominant impact on the performance of massively parallel processors (MPPs). We propose a methodology to evaluate the internode communication performance of MPPs using a controlled set of synthetic workloads. By generating a range of sparse matrices and measuring the performance of a simple parallel algorithm that repeatedly multiplies a sparse matrix by a dense vector, we can determine the relative performance of different communication workloads. Specifiable communication parameters include the number of nodes, the average amount of communication per node, the degree of sharing among the nodes, and the computation-communication ratio. We describe a general procedure for constructing sparse matrices that have these desired communication and computation parameters, and apply a range of these synthetic workloads to evaluate the hierarchical ring interconnection and cache-only memory architecture (COMA) of the Kendall Square Research KSRI MPP. This analysis discusses the impact of the KSRI architecture on communication performance, highlighting the utility and impact of the automatic update feature. It also investigates the impact of system contention on the performance, particularly how it causes potential updates to be ignored.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
"KSR1 Principles of Operation, Kendall Square Research Corporation," Waltham, MA, 1991
|
| |
2
|
"Inside the TC2000 Computer," BBN Advanced Computers Inc., Cambridge, MA 02138, 1990
|
| |
3
|
"Paragon XP/S Product Overview," Intel Corporation, Hillsboro, OR, 1991.
|
| |
4
|
"The Connection Machine CM5 Technical Summary," Thinking Machines Corporation, Cambridge, MA, January, 1992.
|
| |
5
|
|
| |
6
|
|
| |
7
|
K. Gallivan and D. Gannon and W. Jalby and A. Malony and H. Wijshoff 'Behavioral Characterization of Multiprocessor Memory Systems: A Case Study," University of Illinois at Urbana-Champaign, 1988, No. 808.
|
 |
8
|
François Bodin , Daniel Windheiser , William Jalby , Daya Atapattu , Mannho Lee , Dennis Gannon, Performance evaluation and prediction for parallel algorithms on the BBN GP1000, Proceedings of the 4th international conference on Supercomputing, p.401-413, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
9
|
|
| |
10
|
|
| |
11
|
A. Nanda and L. M. Ni. "MAD kernels: An experimental testbed to study multiprocessor memory system behavior," Int. Conf Parallel Processing, Vol. 1, 1992, pp. 28-35.
|
| |
12
|
|
| |
13
|
D.Windheiser,E.L. Boyd, E. Hao, S. G. Abraham, E. S. Davidson. "KSR1 multiprocessor: Analysis of latency hiding techniques in a sparse solver," Int. ParaHd Processing Symposium, April 1993
|
| |
14
|
T H. Dunigan. "Kendall Square Multiprocessor: Early experiences and performance," Oak Ridge National Laboratory Technical Report ORNL/TM- 12065, April 1992.
|
 |
15
|
E. Rosti , E. Smirni , T. D. Wagner , A. W. Apon , L. W. Dowdy, The KSR1: experimentation and modeling of poststore, Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.74-85, May 10-14, 1993, Santa Clara, California, United States
|
 |
16
|
|
| |
17
|
B. Kahhaleh, "Analysis of Memory Latency Factors and their Impact on KSR1 MPP Performance, University of Michigan, Technical Repot, CSE-TR-157- 93, 1993.
|
| |
18
|
|
 |
19
|
|
| |
20
|
H. P Flatt and K. Kennedy "Performance of Parallel Processor,'" Parallel Computing, Vol. 12, No. 1, pp. 1-12, 1989.
|
 |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
X. Zhang and P. Srinivasan, "Distributed Task Processing and Performance on a NUMA Shared Memory Mulfiprocessor," Proceedings of the 2nd IEEE Symposium on Parallel and Distributed Processing, Los Alamitos, CA, pp. 786-789, 1990.
|
|