ACM Home Page
Please provide us with feedback. Feedback
Characterizing memory performance in vector multiprocessors
Full text PdfPdf (990 KB)
Source International Conference on Supercomputing archive
Proceedings of the 6th international conference on Supercomputing table of contents
Washington, D. C., United States
Pages: 35 - 44  
Year of Publication: 1992
ISBN:0-89791-485-6
Authors
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 22,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/143369.143379
What is a DOI?

ABSTRACT

We propose a set of three memory performance measures directed at vector multiprocessors. One is the port reservation time which is closely related to the commonly-used memory bandwidth measure. The second is the vector fill time and is the latency through the memory system for an entire vector operation. The third is the slowest element time, which is the highest effective latency of all the elements of a vector. The three measures are sufficent to characterize the memory system's influence on the processor's usage of memory ports, functional units, and vector registers--the three main resources that determine vector performance. Simulation results for a next-generation-class vector multiprocessor are given to illustrate typical values for the measures and their inter-relationships. These results display a type of bimodal performance behavior where performance is better for both high and low vectorization levels than it is for moderate vectorization levels. The results are also used with a simple code sequence to illustrate the effect of memory system delays on chained and non-chained performance. These results suggest that chaining may be more efficient if longer vector lengths are used.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
BAIL87
BODI90
 
BROO88
E. Brooks, iii, "The Indirect k-ary n-cube for a Vector Processing Environment", Journal of Parallel Computing, vol. 6, 1988, pp.339-348.
BUCH90
CALA89
 
DUBO88
 
GRAN91
E. D. Granston, S. W. Turner, and A, V. Veidenbaum, "Design and Analysis of a Scalable, Shared Memory System with Support for Burst Traffic," Workshop for Scalable Shared Memory Multiprocessors, Kluwer, 1991.
 
KRUS83
C. P. Kruskal and M. Snir, "The Performance of Multistage Interconnection Networks for Multiprocessors," IEEE Trans. Comput., vol. C-32, no. 12, Dec. 1983, pp. 1091-1098.
 
KUMA84
M. Kumar and J. R. Jump, "Performance Enhancement in Buffered Delta Networks Using Crossbar Switches and Multiple Links," Journal of Parallel and Distributed Computing, Vol. 1, 1984, pp. 81-103.
SMIT91
TURN88
 
WADA88
H. Wada, K. Ishii, S. Yazawa, and S. Kawabe, "High-Speed Vector Instruction Execution Schemes of Hitachi Supercomputer S-820 System", 1988 International Conference on Parallel Processing, pp. 291-298.

CITED BY  8

Collaborative Colleagues:
J. E. Smith: colleagues
W. R. Taylor: colleagues