|
ABSTRACT
Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent. Some memory references in such systems, however, suffer long latencies for misses to remotely-cached blocks. To ameliorate this latency, researchers have augmented standard coherence protocols with optimizations for specific sharing patterns, such as read-modify-write, producer-consumer, and migratory sharing. This paper seeks to replace these directed solutions with general prediction logic that monitors coherence activity and triggers appropriate coherence actions.This paper takes the first step toward using general prediction to accelerate coherence protocols by developing and evaluating the Cosmos coherence message predictor. Cosmos predicts the source and type of the next coherence message for a cache block using logic that is an extension of Yeh and Patt's two-level PAp branch predictor. For five scientific applications running on 16 processors, Cosmos has prediction accuracies of 62% to 93%. Cosmos' high prediction accuracy is a result of predictable coherence message signatures that arise from stable sharing patterns of cache blocks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
A. Agarwal , R. Simoni , J. Hennessy , M. Horowitz, An evaluation of directory schemes for cache coherence, Proceedings of the 15th Annual International Symposium on Computer architecture, p.280-298, May 30-June 02, 1988, Honolulu, Hawaii, United States
|
| |
4
|
|
 |
5
|
|
| |
6
|
David Bailey, John Barton, Thomas Lasinski, and Horst Simon. The NAS Parallel Benchmarks. Technical Report RNR-91-002 Revision 2, Ames Research Center, August 1991.
|
 |
7
|
John K. Bennett , John B. Carter , Willy Zwaenepoel, Adaptive software cache management for distributed shared memory architectures, Proceedings of the 17th annual international symposium on Computer Architecture, p.125-134, May 28-31, 1990, Seattle, Washington, United States
|
| |
8
|
B.R. Brooks, R.E. Bruccoleri, B.D. Olafson. D.J. States, S'Swamintathan' and M. Karplus. Charmm: A program for macromolecular energy, minimization, and dynamics calculation. Journal of Computational Chemistry, 4(187), 1983.
|
| |
9
|
Doug Burger and Sanjay Mehta. Parallelizing Appbt for a Shared- Memory Multiprocessor. Technical Report 1286, Computer Sciences Department, University of Wisconsin-Madison, September 1995.
|
 |
10
|
David Chaiken , John Kubiatowicz , Anant Agarwal, LimitLESS directories: A scalable cache coherence scheme, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.224-234, April 08-11, 1991, Santa Clara, California, United States
|
 |
11
|
Satish Chandra , Brad Richards , James R. Larus, Teapot: language support for writing memory coherence protocols, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.237-248, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
12
|
|
| |
13
|
|
 |
14
|
Mark D. Hill , James R. Larus , Steven K. Reinhardt , David A. Wood, Cooperative shared memory: software and hardware for scalable multiprocessor, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.262-273, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
15
|
|
 |
16
|
|
| |
17
|
Anna R. Karlin, Mark S. Manasse, Larry Rudolph, and Daniel D. Sleator. Competitive Snoopy Caching. Algorithmica, 3:79-119, 1988.
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lam. Design of the Stanford DASH Multiprocessor. Technical Report CSL-TR-89- 403, Computer System Laboratory, Stanford University, December 1989.
|
 |
22
|
|
| |
23
|
|
 |
24
|
Shubhendu S. Mukherjee , Babak Falsafi , Mark D. Hill , David A. Wood, Coherent network interfaces for fine-grain communication, Proceedings of the 23rd annual international symposium on Computer architecture, p.247-258, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
25
|
|
| |
26
|
Shubhendu S. Mukherjee, Steven K. Reinhardt, Babak Falsafi. Mike Litzkow, Steve Huss-Lederman, Mark D. Hill, James R. Larus, and David A. Wood. Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator. In Workshop on Performance Analysis and Its Impact on Design (PAID), June 1997.
|
 |
27
|
Shubhendu S. Mukherjee , Shamik D. Sharma , Mark D. Hill , James R. Larus , Anne Rogers , Joel Saltz, Efficient support for irregular applications on distributed-memory machines, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.68-79, July 19-21, 1995, Santa Barbara, California, United States
|
| |
28
|
|
| |
29
|
|
 |
30
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21ST annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
 |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
IEEE Computer Society. IEEE Standard for Scalable Coherent Interface (SCI), 1992.
|
 |
35
|
Per Stenström , Mats Brorsson , Lars Sandberg, An adaptive cache coherence protocol optimized for migratory sharing, Proceedings of the 20th annual international symposium on Computer architecture, p.109-118, May 16-19, 1993, San Diego, California, United States
|
 |
36
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
37
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
38
|
David A. Wood , Satish Chandra , Babak Falsafi , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , Shubhendu S. Mukherjee , Subbarao Palacharla , Steven K. Reinhardt, Mechanisms for cooperative shared memory, Proceedings of the 20th annual international symposium on Computer architecture, p.156-167, May 16-19, 1993, San Diego, California, United States
|
 |
39
|
|
CITED BY 20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Manuel E. Acacio , José González , José M. García , José Duato, Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-12, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
Stephen Somogyi , Thomas F. Wenisch , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Memory coherence activity prediction in commercial workloads, Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, p.37-45, June 20-20, 2004, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas F. Wenisch , Stephen Somogyi , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Temporal Streaming of Shared Memory, ACM SIGARCH Computer Architecture News, v.33 n.2, p.222-233, May 2005
|
|
|
|
|
|
|
|
|
|
|