| Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture |
| Full text |
Pdf
(121 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
table of contents
Baltimore, Maryland
Pages: 1 - 12
Year of Publication: 2002
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society Press
Los Alamitos, CA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 35, Citation Count: 9
|
|
|
ABSTRACT
Cache misses for which data must be obtained from a remote cache (cache-to-cache transfer misses) account for an important fraction of the total miss rate. Unfortunately, cc-NUMA designs put the access to the directory information into the critical path of 3-hop misses, which significantly penalizes them compared to SMP designs. This work studies the use of owner prediction as a means of providing cc-NUMA multiprocessors with a more efficient support for cache-to-cache transfer misses. Our proposal comprises an effective prediction scheme as well as a coherence protocol designed to support the use of prediction. Results indicate that owner prediction can significantly reduce the latency of cache-to-cache transfer misses, which translates into speed-ups on application performance up to 12%. In order to also accelerate most of those 3-hop misses that are either not predicted or mispredicted, the inclusion of a small and fast directory cache in every node is evaluated, leading to improvements up to 16% on the final performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
E. Ender Bilir , Ross M. Dickson , Ying Hu , Manoj Plakal , Daniel J. Sorin , Mark D. Hill , David A. Wood, Multicast snooping: a new coherence method using a multicast address network, Proceedings of the 26th annual international symposium on Computer architecture, p.294-304, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
5
|
|
| |
6
|
|
 |
7
|
Kourosh Gharachorloo , Madhu Sharma , Simon Steely , Stephen Van Doren, Architecture and design of AlphaServer GS320, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.13-24, November 2000, Cambridge, Massachusetts, United States
|
 |
8
|
Antonio González , Mateo Valero , Nigel Topham , Joan M. Parcerisa, Eliminating cache conflict misses through XOR-based placement functions, Proceedings of the 11th international conference on Supercomputing, p.76-83, July 07-11, 1997, Vienna, Austria
[doi> 10.1145/263580.263599]
|
| |
9
|
A. Gupta, W.-D. Weber and T. Mowry. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes". Proc. Int'l Conference on Parallel Processing (ICPP'90), pp. 312--321, August 1990.
|
| |
10
|
L. Gwennap. "Alpha 21364 to Ease Memory Bottleneck". Microprocessor Report, pp. 12--15, October 1998.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
S. Kaxiras and C. Young. "Coherence Communication Prediction in Shared-Memory Multiprocessors". Proc. of the 6th Int'l High Performance Computer Architecture (HPCA-6), pp. 156--167, January 2000.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
 |
23
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
24
|
Z. Zhang. "Architectural Sensitive Application Characterization: The Approach of High-Performance Index-Set (HP-Set)". Technical Report HPL-2001--75, HP Laboratories Palo Alto, March 2001.
|
|