ACM Home Page
Please provide us with feedback. Feedback
An evaluation of a compiler optimization for improving the performance of a coherence directory
Full text PdfPdf (1.01 MB)
Source International Conference on Supercomputing archive
Proceedings of the 8th international conference on Supercomputing table of contents
Manchester, England
Pages: 75 - 84  
Year of Publication: 1994
ISBN:0-89791-665-4
Authors
Farnaz Mounes-Toussi  Department of Electrical Engineering, 200 Union Street S. E., University of Minnesota, Minneapolis, MN
David J. Lilja  Department of Electrical Engineering, 200 Union Street S. E., University of Minnesota, Minneapolis, MN
Zhiyuan Li  Department of Computer science, 200 Union Street S. E., University of Minnesota, Minneapolis, MN
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 12,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/181181.181281
What is a DOI?

ABSTRACT

Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in large-scale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new software-hardware controlled cache coherence mechanism. This approach augments the run-time information available to a directory-based coherence mechanism with compile-time analysis that statically identifies write references that cannot cause coherence problems and writes that should be written through to memory. These references are marked as not needing to send invalidation messages to thereby reduce the network traffic produced by the directory while maintaining cache consistency. For those memory references that are ambiguous, due to conditional branches, or due to the need for complex data flow analysis, for instance, the compiler conservatively marks the references and relies on the hardware directory to ensure coherence. Trace-driven simulations are used to emulate the compile-time analysis on memory traces and to estimate potential performance improvement that could be expected from a compiler performing this optimization on the Perfect Club benchmark programs. By reducing the number of invalidations, this optimized directory scheme is capable of reducing the processor-memory network traffic by up to 54 percent compared to an unoptimized directory mechanism. In addition, the overall miss ratio can be reduced up to 42 percent due to a corresponding reduction in the number of write misses.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
M. Berry, D. Chen, P. Koss, D. Kuck, and S. Lo. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. CSRD report 827, University of Illinois, Urbana, IL, May 1989.
 
3
L. M. Censier and P. Feautrier. A new solution to coherence problems in multicache coherency schemes. IEEE Transactzons on Computers, C-27(12):1112-1118, December 1978.
4
 
5
Y. Chen and M. Dubois. Cache protocol with partial block invalidation. 7th International Parallel Processing Symposium, pages 16-23, 1993.
6
7
8
9
 
10
D. R. Cheriton, It. A. Goosen, and P. Machanick. Restructuring a parallel simulation to improve cache behaviour in a shared-memory multiprocessor: A first experience. Internatsonal Symposium on Shared Memory Multiprocessors, 1989.
11
12
13
 
14
15
 
16
A. Gupta, W. Weber, and T. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. International Conference on Parallel Processing, pages 312-321, 1990.
17
 
18
19
20
 
21
S. L. Min and J. Baer. A performance comparison of directory-based and timestamp-based cache coherence schemes. International Conference on Parallel Process- ,ng, pages 305-311, 1990.
 
22
S. L. Min and J. L. Baer. A timestamp-based cache coherence scheme. International Conference on Parallel Processing, pages 23-32, 1989.
 
23
F. Mounes-Toussi and D. J. Lilja. Performance limits of compiler-directed multiprocessor cache coherence enforcement, in The Interaction of Compilation Technology and Computer Architecture, pages }61-190, 1994. D.J. Lilja and P. Bird (eds.), Kluwer Academic Publishers.
 
24
T. N. Nguyen, Z. Li, and D. 3. Lilja. Efficient use of dynamically tagged directories through compiler analysis. international Conference on Para,llel Processing, pages 112-119, 1993.
 
25
T. N. Nguyen, F. Toussi, D. J. Lilja, and Z. Li. A compiler-assisted adaptive scheme for coherence caches. Parallel Architectures and Compilation Techniques, 1994.
26
 
27
C. D. Polychronopoulos, M. B. Girkar, M. R. Haghighat, L. Lee, B. P. Leung, and D. A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors, international Conference on Parallel Process,ng, pages 39-48, August 1989.
 
28
29
 
30
J. Torrellas, M. S. Lam, and J. L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. International Conference on Parallel Processing, pages 266-270, 1990.

Collaborative Colleagues:
Farnaz Mounes-Toussi: colleagues
David J. Lilja: colleagues
Zhiyuan Li: colleagues