|
ABSTRACT
Previous studies of bus-based shared-memory multiprocessors have shown hybrid write-invalidate/write-update snooping protocols to be incapable of providing consistent performance improvements over write-invalidate protocols. In this paper, we analyze the deficiencies of hybrid snooping protocols under release consistency, and show how these deficiencies can be dramatically reduced by using write caches and read snarfing.Our performance evaluation is based on program-driven simulation and a set of five scientific applications with different sharing behaviors including migratory sharing as well us producer-consumer sharing. We show that a hybrid protocol, extended with write caches as well as read snarfing, manages to reduce the number of coherence misses by between 83% and 95% as compared to a write-invalidate protocol for all five applications in this study. In addition, the number of bus transactions is reduced by between 36% and 60% for four of the applications and by 9% for the fifth application. Because of the small implementation cost of the hybrid protocol and the two extensions, we believe that this combination is an effective approach to boost the performance of bus-based multiprocessors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
Brorsson, M., Dahlgren, E, Nilsson, H., and Stenstrrm, P. "The CacheMire Test Bench -- A Flexible and Effective Approach for Simulation of Multiprocessors," in Proc. of the 26th Annual Simulation Symposium, pp. 41-49, 1993.
|
| |
4
|
|
| |
5
|
DEC, "DECChip 21064 - A RISC Microproces,~~or Preliminary Data Sheet," Digital Equipment Corporation, Maynard, Massachusetts, 1993.
|
 |
6
|
|
 |
7
|
|
| |
8
|
Galles, M. and Williams, E. "Performance optimizations, Implementation, and Verification of the SGI Challenge Multiprooessor," in Proc. of the 27th Hawaii Int. Conference on System Sciences, Vol. 1, pp.134-143, 1994.
|
 |
9
|
Kourosh Gharachorloo , Anoop Gupta , John Hennessy, Performance evaluation of memory consistency models for shared-memory multiprocessors, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.245-257, April 08-11, 1991, Santa Clara, California, United States
|
 |
10
|
|
 |
11
|
|
| |
12
|
Karlin, M.R., Manasse, M.S., Rudolph, L., and Sleator, D.D. "Competitive Snoopy Caching," in Proc. oJ the 27th Annual Symposium on Foundations oj Computer Science, pp.244-254, 1986.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
Thacker, C.P., Conroy, L.C., and Stewart, L.C. "The Alpha Demonstration Unit: A High-Performance Multiprocessor for Software and Chip Development," in Digital Technical Journal, 4(4):51-65, 1992.
|
 |
17
|
|
| |
18
|
|
CITED BY 7
|
|
|
|
|
|
|
|
Chong-Liang Ooi , Seon Wook Kim , Il Park , Rudolf Eigenmann , Babak Falsafi , T. N. Vijaykumar, Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor, Proceedings of the 15th international conference on Supercomputing, p.368-380, June 2001, Sorrento, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|