ACM Home Page
Please provide us with feedback. Feedback
Leveraging cache coherence in active memory systems
Full text PdfPdf (217 KB)
Source International Conference on Supercomputing archive
Proceedings of the 16th international conference on Supercomputing table of contents
New York, New York, USA
SESSION: Architecture table of contents
Pages: 2 - 13  
Year of Publication: 2002
ISBN:1-58113-483-5
Authors
Daehyun Kim  Cornell University, Ithaca, NY
Mainak Chaudhuri  Cornell University, Ithaca, NY
Mark Heinrich  Cornell University, Ithaca, NY
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 50,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/514191.514196
What is a DOI?

ABSTRACT

Active memory systems help processors overcome the memory wall when applications exhibit poor cache behavior. They consist of either active memory elements that perform data parallel computations in the memory system itself, or an active memory controller that supports address re-mapping techniques that improve data locality. Both active memory approaches create coherence problems---even on uniprocessor systems---since there are either additional processors operating on the data directly, or the processor is allowed to refer to the same data via more than one address. While most active memory implementations require cache flushes, we propose a new technique to solve the coherence problem by extending the coherence protocol. Our active memory controller leverages and extends the coherence mechanism, so that re-mapping techniques work transparently on both uniprocessor and multiprocessor systems.We present a microarchitecture for an active memory controller with a programmable core and specialized hardware that accelerates cache line assembly and disassembly. We present detailed simulation results that show uniprocessor speedup from 1.3 to 7.6 on a range of applications and microbenchmarks. In addition to uniprocessor speedup, we show single-node multiprocessor speedup for parallel active memory applications and discuss how the same controller architecture supports coherent multi-node systems called active memory clusters.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
M. Frigo and S. G. Johnson. FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of the 23rd International Conference on Acoustics, Speech, and Signal Processing, pages 1381--1384, 1998
 
4
5
6
 
7
8
9
 
10
 
11
InfiniBand Trade Association. InfiniBand Architecture Specification, Volume 1.0, Release 1.0, October 2000
 
12
Intel, http://developer.intel.com/technology/3gio/. Creating a Third Generation I/O Interconnect
 
13
 
14
D. Keen et al. Cache Coherence in Intelligent Memory Systems. In ISCA 2000 Solving the Memory Wall Problem Workshop, June 2000
 
15
D. Kim, M. Chaudhuri, and M. Heinrich. Leveraging Cache Coherence in Active Memory Systems. Technical Report CSL-TR-2001-1018, Computer Systems Laboratory, Cornell University, November 2001
16
17
18
19
 
20
R. Manohar and M. Heinrich. A Case for Asynchronous Active Memories. In ISCA 2000 Solving the Memory Wall Problem Workshop, June 2000
21
22
 
23
A. K. Nanda et al. High-Throughput Coherence Controllers. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, January 2000
 
24
A. Nowatzyk et al. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 24th International Conference on Parallel Processing, 1995
25
26
 
27
Y. Sazeides and J. E. Smith. Implementations of Context Based Value Predictors. Technical Report ECE-97-8, University of Wisconsin-Madison, December 1997
 
28
 
29
 
30
Silicon Graphics, http://www.sgi.com/origin/3000/. SGI 3000 Family Reference Guide
 
31
 
32
Sun Microsystems, http://www.sun.com/servers/white-papers/. Sun Enterprise 10000 Server--Technical White Paper
 
33
Titan Systems, http://www.aaec.com/projectweb/dis/. DIS Benchmark Suite
 
34
J. Torrellas, L. Yang, and A. T. Nguyen. Toward a Cost-Effective DSM Organization that Exploits Processor-Memory Integration. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, pages 15--25, January 2000
35
 
36
 
37
L. Zhang et al. Pointer-Based Prefetching within the Impulse Adaptable Memory Controller: Initial Results. In Proceedings of the ISCA-2000 Workshop on Solving the Memory Wall Problem, June 2000


Collaborative Colleagues:
Daehyun Kim: colleagues
Mainak Chaudhuri: colleagues
Mark Heinrich: colleagues