|
ABSTRACT
Access latency in large-scale shared-memory multiprocessors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or more levels of cache is an accepted way to reduce the average access latency; however, in a multiprocessor, cached values must be kept coherent for the multiprocessor to support the abstraction of a shared global memory. There is no generally accepted hardware solution to provde cache coherence for large-scale shared-memory multiprocessors. Software coherence strategies offer scalability with current hardware. In this paper we examine a compiler-based software strategy for maintaining cache coherence that relies on dependence analysis and a vectorization algorithm to insert cache control directives. Experiments on the BBN TC2000 for a pair of numerical problems show that the run-time cost of coherence using our strategy is less than that for previously proposed compiler-based software methods and suggest that it should compare favorably with proposed hardware schemes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
3
|
L. M. Censier and P. Feautrier. A new solution to coherence problems in multicache systems. IEEE Transavtion~ oi~ Computers, C-27(12);1112-1118, Dec. 1978.
|
| |
4
|
|
| |
5
|
R. Cytron, S. Karlovsky, and K. McAuliffe. Automatic management of programmable caches. IlL Proc. of the 1988 International Conference on Parallel Processing, pages 229-238, ?, Aug. 1988.
|
| |
6
|
E. Darnell, J. M. Mellor-Crummcy, and K. Kennedy. Automatic software cache coherence through vectorizat.ion. Technical Report CRPC-TR92197, Computer Science Department, Rice University, Jan. 1992.
|
| |
7
|
|
| |
8
|
|
 |
9
|
D. J. Kuck , R. H. Kuhn , D. A. Padua , B. Leasure , M. Wolfe, Dependence graphs and compiler optimizations, Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.207-218, January 26-28, 1981, Williamsburg, Virginia
[doi> 10.1145/567532.567555]
|
| |
10
|
L. Lamport. How to make a multiprocessor that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9), Sept. 1979.
|
 |
11
|
|
| |
12
|
S. Min and J. B~er. A timestamp-based c~che coherence scheme. In Proc. of the 1989 International Con- }emnce on Parallel Processing, volume 1, pages 23-32, Aug. 1989.
|
 |
13
|
|
| |
14
|
|
| |
15
|
Par,'dlel Computing Forum. PCF Fortran, Mar. 1990. Working Draft.
|
| |
16
|
|
| |
17
|
|
|