|
ABSTRACT
In this paper, cache design is explored for large high-performance multiprocessors with hundreds or thousands of processors and memory modules interconnected by a pipe-lined multi-stage network. The majority of the multiprocessor cache studies in the literature exclusively focus on the issue of cache coherence enforcement. However, there are other characteristics unique to such multiprocessors which create an environment for cache performance that is very different from that of many uniprocessors.
Multiprocessor conditions are identified and modeled, including, 1) the cost of a cache coherence enforcement scheme, 2) the effect of a high degree of overlap between cache miss services, 3) the cost of a pin limited data path between shared memory and caches, 4) the effect of a high degree of data prefetching, 5) the program behavior of a scientific workload as represented by 23 numerical subroutines, and 6) the parallel execution of programs. This model is used to show that the cache miss ratio is not a suitable performance measure in the multiprocessors of interest and to show that the optimal cache block size in such multiprocessors is much smaller than in many uniprocessors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
AbKL79
|
W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie, "Automatic Program Transformations for Virtual Memory Computers," Proc. of the 1979 National Computer Conf., pp. 989-974, June 1979
|
| |
AbLM84
|
W. Abu-Sufah, R. L. Lee, and M. Malkawi, "Identifying Two Program Categories for Memory Management Purposes," Proc. of the 8th International Computer Software and Applicationt Conference, November 1984.
|
 |
ArBa84
|
|
| |
CeFe78
|
L. M. Censier and P. Feautrier, "A New Solution to Coherence Problems in Multicache Systems," IEEE Trans. on Computers, Vol. C-27, No. 12, pp. 1112- 1118, December 1978.
|
 |
Clar83
|
|
| |
DuBr82
|
M. Dubols and F. A. Briggs, "Effects of Cache Coherency in Multlproeessors," IEEE Trans. on Computers, Vol. C-31, No. 11, pp. 1083-1099, November 1982.
|
| |
Elsf74
|
J. L. Elshoff, "Some Programming Techniques for Processing Multi-Dimensional Matrices in a Paging Environment," Proceedings of the National Computer Conference, pp. 185-193, 1974.
|
| |
FrWT82
|
M. A. Franklin, D. F. Wann, and W. J. Thomas, "Pin Limitations and Partitioning of VLSI Interconnection Networks," IEEE Trans. on Computers, Vol. C-31, No. 11, pp. 1109-1116, November 1982.
|
| |
GGKM83
|
A. Gottlieb, R. Grishman, C. Kruskal, K. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer - Designing a MIMD, Shared Memory Parallel Machine," IEEE Trans. on Computers, Vol. C-32, No. 2, pp. 175-189, February 1983.
|
| |
GKLS83
|
D. Gajski, D. Kuck, D. Lawrie, and A. Sameh, "CEDAR - A Large Scale Multiproeessor," Pros. of the 1983 International Conf. on Parallel Processing, pp. 524-529, August 1983.
|
 |
Good83
|
|
 |
HiSm84
|
|
| |
Husm86
|
|
| |
KaWi73
|
K. R. Kaplan and R. O. Winder, "Cache-based Computer Systems," Computer, Vol. 6, No. 3, pp. 30-36, March 1973.
|
| |
KDLS86
|
D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A. H. Sameh, "Parallel Supercomputlng Today and the Cedar Approach," Selenee, Vol. 231, pp. 967-974, February 28, 1986.
|
| |
Krof81
|
|
| |
Lawr75
|
D. H. Lawrie, "Access and Alignment of Data in an Array Processor," IEEE Trans. on Computers, Vol. C-24, No. 12, pp. 1145-1155, December 1975.
|
| |
Lee87
|
R. L. Lee, Ph.D. Thesis in preparation. Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1987.
|
| |
LeYL87
|
R. L. Lee, P-C Yew and D. H. Lawrie, "Data Prefetching in Shared Memory Multiprocessors," to appear in, Proc. of the 1987 International Conf. on Parallel Processing, 1987.
|
| |
NoAb82
|
R. L. Norton and J. A. Abraham, "Using Write Back Cache to Improve Performance of Multiuser Multiprocessors," Proc. of the 1982 International Conf. on Parallel Processing, pp. 326-331, 1982.
|
 |
PaPa84
|
|
| |
PBGH85
|
|
 |
RuSe84
|
|
 |
SmGo83
|
|
 |
Smit82
|
|
 |
Smit85
|
|
| |
Tang76
|
C. K. Tang, "Cache System Design in the Tightly Coupled Multiprocessor System," AFIPS Proc., National Computer Conference, Vol. 45, pp. 749- 753, 1976.
|
| |
Widd80
|
L. C. Widdoes, "The S-1 Project: Development of High-Performance Digital Computers," IEEE COMPCON 1980 pp. 282-291, 1980.
|
| |
Wolf78
|
M. J. Wolfe, "Techniques for Improving the Inherent Parallelism in Programs," M.S. Thesis, Tech. Rep. UIUCDCS-R-78-929, Dept. of Computer Science, University of Illinois at Urbana-Champaign, July 1978.
|
| |
YePD83
|
P. C. Yeh, J. H. Patel and E. S. Davidson, "Shared Cache for Multiple-Stream Computer Systems," IEEE Trans. on Computers, Vol. C-32, No. 1, pp. 38-47, January 1983.
|
| |
YeYF85
|
W. C. Yen, D. W. L Yen, and K. S. Fu, "Data Coherence Problem in a Multi,ache System," IEEE Trans. on Computers, Vol. C-34, No. 1, pp. 56-65, January 1985.
|
CITED BY 20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. R. Cheriton , A. Gupta , P. D. Boyle , H. A. Goosen, The VMP multiprocessor: initial experience, refinements, and performance evaluation, ACM SIGARCH Computer Architecture News, v.16 n.2, p.410-421, May 1988
|
|
|
|
|
|
Michel Dubois , Jin Chin Wang , Luiz A. Barroso , Kangwoo Lee , Yung-Syau Chen, Delayed consistency and its effects on the miss rate of parallel programs, Proceedings of the 1991 ACM/IEEE conference on Supercomputing, p.197-206, November 18-22, 1991, Albuquerque, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|