|
ABSTRACT
Cache affinity is important to the performance of scalable shared memory multiprocessors. For multiprocessors without hardware cache coherence support, software cache coherence is the only alternative. Most existing software cache schemes ignore cache affinity across parallel loops. In this paper, we propose a new scheme, Cache Affinity-based Software cache coherence scheme (CAS), that exploits cache affinity across parallel loops to achieve high cache hit ratios without requiring extra hardware support. The experimental results show that the new scheme outperforms other existing schemes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agarwal, D. Chaiken, K. Johnson, D. Kranz, J. Kubiatowicz, K. Kurihara, B.-H. Lim, G. Maa, and D. Nussbaum. The MIT Alewife Machine: A Large- Scale Distributed-Memory Multiprocessor. In Scalable Shared Memory Architectures. Kluwer Academic Publishers, 1991.
|
| |
2
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
3
|
|
| |
4
|
BBN Advanced Computers Inc. Inside the TC2000 Computer 1990.
|
 |
5
|
William J. Bolosky , Michael L. Scott , Robert P. Fitzgerald , Robert J. Fowler , Alan L. Cox, NUMA policies and their relation to memory architecture, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.212-221, April 08-11, 1991, Santa Clara, California, United States
|
 |
6
|
John B. Carter , John K. Bennett , Willy Zwaenepoel, Implementation and performance of Munin, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.152-164, October 13-16, 1991, Pacific Grove, California, United States
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
H. Cheong and A. V. Veidenbaum. Stale Data Detection and Coherence Enforcement Using Flow Analysis. In Proc. of the International Conference on Parallel Processing, pages 138-145, 1988.
|
 |
11
|
|
| |
12
|
R. Cytron, S. Karlovsky and K. P. McAulife. Automatic Management of Programmable Caches (Extended Abstract). In Proc. of the International Conference on Parallel Processing pages 229-238, 1988.
|
 |
13
|
Ervan Darnell , John M. Mellor-Crummey , Ken Kennedy, Automatic software cache coherence through vectorization, Proceedings of the 6th international conference on Supercomputing, p.129-138, July 19-24, 1992, Washington, D. C., United States
[doi> 10.1145/143369.143398]
|
 |
14
|
|
| |
15
|
|
 |
16
|
Seema Hiranandani , Ken Kennedy , Chau-Wen Tseng, Compiler optimizations for Fortran D on MIMD distributed-memory machines, Proceedings of the 1991 ACM/IEEE conference on Supercomputing, p.86-100, November 18-22, 1991, Albuquerque, New Mexico, United States
[doi> 10.1145/125826.125886]
|
| |
17
|
J. Konicek, T. Tilton, A. Veidenbaum, C. Q. Zhu, E. S. Davidson, R. Downing, M. Haney, M. Sharma, P. C. Yew, P. M. Farmwald, D. Kuck, D. Lavery R. Lindsey D. Pointer, J. Andrews, T. Beck, T. Murphy, S. Turner, and N. Warter. The Organization of the Cedar System. In Proc. of He International Conference on Parallel Processing pages Vol.I, 49-56, 1991.
|
 |
18
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
| |
19
|
|
| |
20
|
H. Li. K.C. Sevcik Exploiting Cache Affinity in Software Cache Coherence. Technical Report 299, University of Toronto, CSRI. April, 1994.
|
| |
21
|
S. L. Min and J.-L. Baer. A Timestamp-based Cache Coherence Scheme. In Proc. of the international Conference on Parallel Processing pages Vol {: 23-32, 1989.
|
| |
22
|
S. L. Min and J.-L. Baer. A Performance Comparison of Directory-based and Timestamp-Based Cache Coherence Schemes. In Proc. of the International Conference on Parallel Processing pages Vol I: 305-311, 1990.
|
| |
23
|
Kendall Sqaure Research. KSR1 Princples of Operation. Waltham, MA, 1991.
|
| |
24
|
Kendall Sqaure Research. KSR Fortran Programming. Waltham, MA, 1993.
|
 |
25
|
Edward Rothberg , Jaswinder Pal Singh , Anoop Gupta, Working sets, cache sizes, and node granularity issues for large-scale multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.14-26, May 16-19, 1993, San Diego, California, United States
|
 |
26
|
Harjinder S. Sandhu , Benjamin Gamsa , Songnian Zhou, The shared regions approach to software cache coherence on multiprocessors, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.229-238, May 19-22, 1993, San Diego, California, United States
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
|