|
ABSTRACT
Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in large-scale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new software-hardware controlled cache coherence mechanism. This approach augments the run-time information available to a directory-based coherence mechanism with compile-time analysis that statically identifies write references that cannot cause coherence problems and writes that should be written through to memory. These references are marked as not needing to send invalidation messages to thereby reduce the network traffic produced by the directory while maintaining cache consistency. For those memory references that are ambiguous, due to conditional branches, or due to the need for complex data flow analysis, for instance, the compiler conservatively marks the references and relies on the hardware directory to ensure coherence. Trace-driven simulations are used to emulate the compile-time analysis on memory traces and to estimate potential performance improvement that could be expected from a compiler performing this optimization on the Perfect Club benchmark programs. By reducing the number of invalidations, this optimized directory scheme is capable of reducing the processor-memory network traffic by up to 54 percent compared to an unoptimized directory mechanism. In addition, the overall miss ratio can be reduced up to 42 percent due to a corresponding reduction in the number of write misses.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
M. Berry, D. Chen, P. Koss, D. Kuck, and S. Lo. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. CSRD report 827, University of Illinois, Urbana, IL, May 1989.
|
| |
3
|
L. M. Censier and P. Feautrier. A new solution to coherence problems in multicache coherency schemes. IEEE Transactzons on Computers, C-27(12):1112-1118, December 1978.
|
 |
4
|
David Chaiken , John Kubiatowicz , Anant Agarwal, LimitLESS directories: A scalable cache coherence scheme, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.224-234, April 08-11, 1991, Santa Clara, California, United States
|
| |
5
|
Y. Chen and M. Dubois. Cache protocol with partial block invalidation. 7th International Parallel Processing Symposium, pages 16-23, 1993.
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
D. R. Cheriton, It. A. Goosen, and P. Machanick. Restructuring a parallel simulation to improve cache behaviour in a shared-memory multiprocessor: A first experience. Internatsonal Symposium on Shared Memory Multiprocessors, 1989.
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
A. Gupta, W. Weber, and T. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. International Conference on Parallel Processing, pages 312-321, 1990.
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
S. L. Min and J. Baer. A performance comparison of directory-based and timestamp-based cache coherence schemes. International Conference on Parallel Process- ,ng, pages 305-311, 1990.
|
| |
22
|
S. L. Min and J. L. Baer. A timestamp-based cache coherence scheme. International Conference on Parallel Processing, pages 23-32, 1989.
|
| |
23
|
F. Mounes-Toussi and D. J. Lilja. Performance limits of compiler-directed multiprocessor cache coherence enforcement, in The Interaction of Compilation Technology and Computer Architecture, pages }61-190, 1994. D.J. Lilja and P. Bird (eds.), Kluwer Academic Publishers.
|
| |
24
|
T. N. Nguyen, Z. Li, and D. 3. Lilja. Efficient use of dynamically tagged directories through compiler analysis. international Conference on Para,llel Processing, pages 112-119, 1993.
|
| |
25
|
T. N. Nguyen, F. Toussi, D. J. Lilja, and Z. Li. A compiler-assisted adaptive scheme for coherence caches. Parallel Architectures and Compilation Techniques, 1994.
|
 |
26
|
|
| |
27
|
C. D. Polychronopoulos, M. B. Girkar, M. R. Haghighat, L. Lee, B. P. Leung, and D. A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors, international Conference on Parallel Process,ng, pages 39-48, August 1989.
|
| |
28
|
|
 |
29
|
|
| |
30
|
J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. International Conference on Parallel Processing, pages 266-270, 1990.
|
|