|
ABSTRACT
We propose a set of three memory performance measures directed at vector multiprocessors. One is the port reservation time which is closely related to the commonly-used memory bandwidth measure. The second is the vector fill time and is the latency through the memory system for an entire vector operation. The third is the slowest element time, which is the highest effective latency of all the elements of a vector. The three measures are sufficent to characterize the memory system's influence on the processor's usage of memory ports, functional units, and vector registers--the three main resources that determine vector performance.
Simulation results for a next-generation-class vector multiprocessor are given to illustrate typical values for the measures and their inter-relationships. These results display a type of bimodal performance behavior where performance is better for both high and low vectorization levels than it is for moderate vectorization levels. The results are also used with a simple code sequence to illustrate the effect of memory system delays on chained and non-chained performance. These results suggest that chaining may be more efficient if longer vector lengths are used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
BAIL87
|
|
 |
BODI90
|
François Bodin , Daniel Windheiser , William Jalby , Daya Atapattu , Mannho Lee , Dennis Gannon, Performance evaluation and prediction for parallel algorithms on the BBN GP1000, Proceedings of the 4th international conference on Supercomputing, p.401-413, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
BROO88
|
E. Brooks, iii, "The Indirect k-ary n-cube for a Vector Processing Environment", Journal of Parallel Computing, vol. 6, 1988, pp.339-348.
|
 |
BUCH90
|
|
 |
CALA89
|
|
| |
DUBO88
|
Michel, Dubois , Christoph Scheurich , Fayé A. Briggs, Synchronization, Coherence, and Event Ordering in Multiprocessors, Computer, v.21 n.2, p.9-21, February 1988
[doi> 10.1109/2.15]
|
| |
GRAN91
|
E. D. Granston, S. W. Turner, and A, V. Veidenbaum, "Design and Analysis of a Scalable, Shared Memory System with Support for Burst Traffic," Workshop for Scalable Shared Memory Multiprocessors, Kluwer, 1991.
|
| |
KRUS83
|
C. P. Kruskal and M. Snir, "The Performance of Multistage Interconnection Networks for Multiprocessors," IEEE Trans. Comput., vol. C-32, no. 12, Dec. 1983, pp. 1091-1098.
|
| |
KUMA84
|
M. Kumar and J. R. Jump, "Performance Enhancement in Buffered Delta Networks Using Crossbar Switches and Multiple Links," Journal of Parallel and Distributed Computing, Vol. 1, 1984, pp. 81-103.
|
 |
SMIT91
|
|
 |
TURN88
|
|
| |
WADA88
|
H. Wada, K. Ishii, S. Yazawa, and S. Kawabe, "High-Speed Vector Instruction Execution Schemes of Hitachi Supercomputer S-820 System", 1988 International Conference on Parallel Processing, pp. 291-298.
|
|