ACM Home Page
Please provide us with feedback. Feedback
Active memory operations
Full text PdfPdf (409 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 21st annual international conference on Supercomputing table of contents
Seattle, Washington
SESSION: Architecture -- multiprocessor systems table of contents
Pages: 232 - 241  
Year of Publication: 2007
ISBN:978-1-59593-768-1
Authors
Zhen Fang  Intel Corp., Hillsboro, OR
Lixin Zhang  IBM Austin Research Lab, Austin, TX
John B. Carter  University of Utah, Salt Lake City, UT
Ali Ibrahim  AMD, Santa Clara, CA
Michael A. Parker  Cray, Inc., Chippewa Falls, WI
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 91,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1274971.1275004
What is a DOI?

ABSTRACT

The performance of modern microprocessors is increasingly limited by their inability to hide main memory latency. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose the use of Active Memory Operations (AMOs), in which select operations can be sent to and executed on the home memory controller of data. AMOs can eliminate significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips.

In this paper we present architectural and programming models for AMOs, and compare its performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50X faster barriers, 12X faster spinlocks, 8.5X-15X faster stream/array operations, and 3X faster database queries. Based on a standard cell implementation, we predict that the circuitry required to support AMOs is less than 1% of the typical chip area of a high performance microprocessor.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
TPC-D, Past, Present and Future: An Interview between Berni Schiefer, Chair of the TPC-D Subcommittee and Kim Shanley, TPC Chief Operating Officer. available from http://www.tpc.org/.
 
2
 
3
 
4
5
 
6
 
7
 
8
9
 
10
11
 
12
Hewlett-Packard Inc. The open source database benchmark.
 
13
Intel Corp. Intel Itanium 2 processor reference manual.
 
14
International Technology Roadmap for Semiconductors.
 
15
K. Keeton and D. Patterson. Towards a Simplified Database Workloads for Computer Architecture Evaluation. 2000.
 
16
 
17
D. Koester and J. Kepner. HPCS Assessment Framework and Benchmarks. MITRE and MIT Lincoln Laboratory, Mar. 2003.
 
18
P. Kogge. The EXECUBE approach to massively parallel processing. In International Conference on Parallel Processing, Aug. 1994.
19
20
 
21
J. McCalpin. The stream benchmark, 1999.
22
 
23
24
 
25
F. Petrini, et al. Scalable collective communication on the ASCI Q machine. In Hot Interconnects 11, Aug. 2003.
 
26
 
27
R. Rajwar, A. Kagi, and J. R. Goodman. Improving the throughput of synchronization by insertion of delays. In Proc. of the Sixth HPCA, pp. 168--179, Jan. 2000.
28
29
 
30
SGI. SN2-MIPS Communication Protocol Specification, 2001.
 
31
SGI. Orbit Functional Specification, Vol. 1, 2002.
 
32
M. Shao, A. Ailamaki, and B. Falsafi. DBmbench: Fast and accurate database workload representation on modern microarchitecture. TR CMU-CS-03-161, Carnegie Mellon University, 2003.
33
 
34
 
35
 
36
J. Torrellas, A.-T. Nguyen, and L. Yang. Toward a cost-effective DSM organization that exploits processor-memory integration. In Proc. of the 7th HPCA, pp. 15--25, Jan. 2000.
37
 
38
L. Zhang. UVSIM reference manual. TR UUCS-03-011, University of Utah, May 2003.
 
39
L. Zhang, Z. Fang, and J. B. Carter. Highly efficient synchronization based on active memory operations. In IPDPS, Apr. 2004.
 
40


Collaborative Colleagues:
Zhen Fang: colleagues
Lixin Zhang: colleagues
John B. Carter: colleagues
Ali Ibrahim: colleagues
Michael A. Parker: colleagues