ACM Home Page
Please provide us with feedback. Feedback
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Full text PdfPdf (1.76 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 25th annual international symposium on Computer architecture table of contents
Barcelona, Spain
Pages: 342 - 355  
Year of Publication: 1998
ISBN:0-8186-8491-7
Also published in ...
Authors
Vijayaraghavan Soundararajan  Computer Systems Lab, Stanford University, Stanford, CA
Mark Heinrich  Computer Systems Lab, Stanford University, Stanford, CA
Ben Verghese  Digital Equipment Corporation, Western Research Lab, Palo Alto, CA
Kourosh Gharachorloo  Digital Equipment Corporation, Western Research Lab, Palo Alto, CA
Anoop Gupta  Computer Systems Lab, Stanford University, Stanford, CA and Microsoft Corporation, Redmond, WA
John Hennessy  Computer Systems Lab, Stanford University, Stanford, CA
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 61,   Citation Count: 15
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/279358.279403
What is a DOI?

ABSTRACT

Given the limitations of bus-based multiprocessors, CC-NUMA is the scalable architecture of choice for shared-memory machines. The most important characteristic of the CC-NUMA architecture is that the latency to access data on a remote node is considerably larger than the latency to access local memory. On such machines, good data locality can reduce memory stall time and is therefore a critical factor in application performance.In this paper we study the various options available to system designers to transparently decrease the fraction of data misses serviced remotely. This work is done in the context of the Stanford FLASH multiprocessor. FLASH is unique in that each node has a single pool of DRAM that can be used in a variety of ways by the programmable memory controller. We use the programmability of FLASH to explore different options for cache-coherence and data-locality in compute-server workloads. First, we consider two protocols for providing base cache-coherence, one with centralized directory information (dynamic pointer allocation) and another with distributed directory information (SCI). While several commercial systems are based on SCI, we find that a centralized scheme has superior performance. Next, we consider different hardware and software techniques that use some or all of the local memory in a node to improve data locality. Finally, we propose a hybrid scheme that combines hardware and software techniques. These schemes work on the same base platform with both user and kernel references from the workloads. The paper thus offers a realistic and fair comparison of replication/migration techniques that has not previously been feasible.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
5
 
6
Data General Corporation. Aviion AV 20000 Server Technical Overview. Data General White Paper, http ://www. dg. corn/about/htm I/aviion_av2OOOO_se rye r tech nical_overview.html, 1997.
7
 
8
Steven Frank, Henry Burkhardt III and Dr. James Rothnie. The KSR 1: Bridging the Gap Between Shared Memory and MPPs. Compcon '93 Proceedings.
 
9
 
10
Erik Hagersten, Ashley Saulsbury, and Anders Landin. Simple COMA Node Implementations. In Proceedings of the 27th Hawaii International Conference on System Sciences, January. I994.
11
 
12
M. Heinnch, V. Soundararajan, A. Gupta. and J. Hennessy. A Quantitative Analysis of the Performance and Scalability of Cache Coherence Protocols. Submitted for publication.
 
13
14
15
16
17
18
 
19
A. Nowatzyk, G. Aybay, M. Browne, E. Kelly, M. Parkin, B. Radke, and S. Vishi. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 1995 International Conference on Parallel Processing, pages I1-I10, August 1995.
20
 
21
 
22
Scalable Coherent interface. IEEE Standard 1596-1992. August 1993.
 
23
24
25
26
27
28
 
29

CITED BY  15

Collaborative Colleagues:
Vijayaraghavan Soundararajan: colleagues
Mark Heinrich: colleagues
Ben Verghese: colleagues
Kourosh Gharachorloo: colleagues
Anoop Gupta: colleagues
John Hennessy: colleagues