|
ABSTRACT
We propose a set of three memory performance measures directed at vector multiprocessors. One is the port reservation time which is closely related to the commonly-used memory bandwidth measure. The second is the vector fill time and is the latency through the memory system for an entire vector operation. The third is the slowest element time, which is the highest effective latency of all the elements of a vector. The three measures are sufficent to characterize the memory system's influence on the processor's usage of memory ports, functional units, and vector registers--the three main resources that determine vector performance.
Simulation results for a next-generation-class vector multiprocessor are given to illustrate typical values for the measures and their inter-relationships. These results display a type of bimodal performance behavior where performance is better for both high and low vectorization levels than it is for moderate vectorization levels. The results are also used with a simple code sequence to illustrate the effect of memory system delays on chained and non-chained performance. These results suggest that chaining may be more efficient if longer vector lengths are used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
BAIL87
|
|
 |
BODI90
|
François Bodin , Daniel Windheiser , William Jalby , Daya Atapattu , Mannho Lee , Dennis Gannon, Performance evaluation and prediction for parallel algorithms on the BBN GP1000, Proceedings of the 4th international conference on Supercomputing, p.401-413, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
BROO88
|
E. Brooks, iii, "The Indirect k-ary n-cube for a Vector Processing Environment", Journal of Parallel Computing, vol. 6, 1988, pp.339-348.
|
 |
BUCH90
|
|
 |
CALA89
|
|
| |
DUBO88
|
Michel, Dubois , Christoph Scheurich , Fayé A. Briggs, Synchronization, Coherence, and Event Ordering in Multiprocessors, Computer, v.21 n.2, p.9-21, February 1988
[doi> 10.1109/2.15]
|
| |
GRAN91
|
E. D. Granston, S. W. Turner, and A, V. Veidenbaum, "Design and Analysis of a Scalable, Shared Memory System with Support for Burst Traffic," Workshop for Scalable Shared Memory Multiprocessors, Kluwer, 1991.
|
| |
KRUS83
|
C. P. Kruskal and M. Snir, "The Performance of Multistage Interconnection Networks for Multiprocessors," IEEE Trans. Comput., vol. C-32, no. 12, Dec. 1983, pp. 1091-1098.
|
| |
KUMA84
|
M. Kumar and J. R. Jump, "Performance Enhancement in Buffered Delta Networks Using Crossbar Switches and Multiple Links," Journal of Parallel and Distributed Computing, Vol. 1, 1984, pp. 81-103.
|
 |
SMIT91
|
|
 |
TURN88
|
|
| |
WADA88
|
H. Wada, K. Ishii, S. Yazawa, and S. Kawabe, "High-Speed Vector Instruction Execution Schemes of Hitachi Supercomputer S-820 System", 1988 International Conference on Parallel Processing, pp. 291-298.
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|