ACM Home Page
Please provide us with feedback. Feedback
TMA: a trap-based memory architecture
Full text PdfPdf (441 KB)
Source International Conference on Supercomputing archive
Proceedings of the 20th annual international conference on Supercomputing table of contents
Cairns, Queensland, Australia
SESSION: Memory table of contents
Pages: 259 - 268  
Year of Publication: 2006
ISBN:1-59593-282-8
Authors
Håkan Zeffer  Uppsala University, Uppsala, SWEDEN
Zoran Radović  Uppsala University, Uppsala, SWEDEN
Martin Karlsson  Uppsala University, Uppsala, SWEDEN
Erik Hagersten  Uppsala University, Uppsala, SWEDEN
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 22,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1183401.1183438
What is a DOI?

ABSTRACT

The advances in semiconductor technology have set the shared-memory server trend towards processors with multiple cores per die and multiple threads per core. We believe that this technology shift forces a reevaluation of how to interconnect multiple such chips to form larger systems.This paper argues that by adding support for coherence traps in future chip multiprocessors, large-scale server systems can be formed at a much lower cost. This is due to shorter design time, verification and time to market when compared to its traditional all-hardware counter part. In the proposed trap-based memory architecture (TMA), software trap handlers are responsible for obtaining read/write permission, whereas the coherence trap hardware is responsible for the actual permission check.In this paper we evaluate a TMA implementation (called TMA Lite) with a minimal amount of hardware extensions, all contained within the processor. The proposed mechanisms for coherence trap processing should not affect the critical path and have a negligible cost in terms of area and power for most processor designs.Our evaluation is based on detailed full system simulation using out-of-order processors with one or two dual-threaded cores per die as processing nodes. The results show that a TMA based distributed shared memory system can perform on par with a highly optimized hardware based design.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Agarwal et al. The MIT Alewife Machine. IEEE Proceedings, 1999.
 
2
3
4
5
6
7
 
8
 
9
10
 
11
InfiniBand Trade Association, InfiniBand Architecture Specification, Release 1.2, October 2004. Available from http://www.infinibandta.org.
 
12
 
13
 
14
K. Krewell. Sun's Niagara Begins CMT Flood: The Sun UltraSPARC T1 Processor Released. In Microprocessor Report, January 2006.
15
 
16
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):690--691, September 1979.
17
18
 
19
 
20
A. Nowatzyk et al. The S3.mp Scalable Shared Memory Multiprocessor. In ICPP'95, volume I, pages 1--10, August 1995.
 
21
22
23
24
25
26
27
 
28
 
29
30
 
31
 
32
D. Wallin et al. Vasa: A Simulator Infrastructure with Adjustable Fidelity. In PDCS 2005, November 2005.
 
33
D. L. Weaver and T. Germond, editors. The SPARC Architecture Manual, Version 9. PTR Prentice Hall, 2000.
34
 
35
H. Zeffer et al. Exploiting Spatial Store Locality through Permission Caching in Software DSMs. In Euro-Par 2004, pages 551--560, August 2004.
 
36


Collaborative Colleagues:
Håkan Zeffer: colleagues
Zoran Radović: colleagues
Martin Karlsson: colleagues
Erik Hagersten: colleagues