|
ABSTRACT
Recent work has shown that silent stores--stores which write a value matching the one already stored at the memory location--occur quite frequently and can be exploited to reduce memory traffic and improve performance. This paper extends the definition of silent stores to encompass sets of stores that change the value stored at a memory location, but only temporarily, and subsequently return a previous value of interest to the memory location. The stores that cause the value to revert are called temporally silent stores. We redefine multiprocessor sharing to account for temporal silence and show that in the limit, up to 45% of communication misses in scientific and commercial applications can be eliminated by exploiting values that change only temporarily. We describe a practical mechanism that detects temporally silent stores and removes the coherence traffic they cause in conventional multiprocessors. We find that up to 42% of communication misses can be eliminated with a simple extension to the MESI protocol. Further, we examine application and operating system code to provide insight into the temporal silence phenomenon and characterize temporal silence by examining value frequencies and dynamic instruction distances between temporally silent pairs. These studies indicate that the operating system is involved heavily in temporal silence, in both commercial and scientific workloads, and that while detectable synchronization primitives provide substantial contributions, significant opportunity exists outside these references.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
A. Alameldeen, C. Mauer, M. Xu, P. Harper, M. Martin, D. Sorin, M. Hill, and D. Wood. Evaluating non-deterministic multi-threaded commercial workloads. In Proceedings of Computer Architecture Evaluation using Commercial Workloads (CAECW-02), February 2002.
|
 |
3
|
|
| |
4
|
|
| |
5
|
J. Borkenhagen and S. Storino. 5th Generation 64-bit Power-PC-Compatible Commercial Processor Design. IBM White-paper available from http://www.rs6000.ibm.com, 1999.
|
| |
6
|
H. W. Cain, R. Rajwar, M. Marden, and M. H. Lipasti. An architectural characterization of java tpc-w. In Proc. of HPCA-7, January 2001.
|
| |
7
|
|
| |
8
|
IBM Corporation. AIX v4.3 online documentation. http://nc-sp.upenn.edu/aix4.3html/, 2002.
|
| |
9
|
|
 |
10
|
Michel Dubois , Jonas Skeppstedt , Livio Ricciulli , Krishnan Ramamurthy , Per Stenström, The detection and elimination of useless misses in multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.88-97, May 16-19, 1993, San Diego, California, United States
|
 |
11
|
|
| |
12
|
|
| |
13
|
T. Keller, A. M. Maynard, R. Simpson, and P. Bohrer. Simos-ppc full system simulator. http://www.cs.utexas.edu/users/cart/simOS.
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
Milo M. K. Martin , Daniel J. Sorin , Anastassia Ailamaki , Alaa R. Alameldeen , Ross M. Dickson , Carl J. Mauer , Kevin E. Moore , Manoj Plakal , Mark D. Hill , David H. Wood, Timestamp snooping: an approach for extending SMPs, ACM SIGPLAN Notices, v.35 n.11, p.25-36, Nov. 2000
[doi> 10.1145/356989.356992]
|
| |
19
|
C. Moore. POWER4 system microarchitecture. In Proceedings of the Microprocessor Forum, October 2000.
|
| |
20
|
|
| |
21
|
|
 |
22
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
|