|
ABSTRACT
The advances in semiconductor technology have set the shared-memory server trend towards processors with multiple cores per die and multiple threads per core. We believe that this technology shift forces a reevaluation of how to interconnect multiple such chips to form larger systems.This paper argues that by adding support for coherence traps in future chip multiprocessors, large-scale server systems can be formed at a much lower cost. This is due to shorter design time, verification and time to market when compared to its traditional all-hardware counter part. In the proposed trap-based memory architecture (TMA), software trap handlers are responsible for obtaining read/write permission, whereas the coherence trap hardware is responsible for the actual permission check.In this paper we evaluate a TMA implementation (called TMA Lite) with a minimal amount of hardware extensions, all contained within the processor. The proposed mechanisms for coherence trap processing should not affect the critical path and have a negligible cost in terms of area and power for most processor designs.Our evaluation is based on detailed full system simulation using out-of-order processors with one or two dual-threaded cores per die as processing nodes. The results show that a TMA based distributed shared memory system can perform on par with a highly optimized hardware based design.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agarwal et al. The MIT Alewife Machine. IEEE Proceedings, 1999.
|
| |
2
|
Cristiana Amza , Alan L. Cox , Sandhya Dwarkadas , Pete Keleher , Honghui Lu , Ramakrishnan Rajamony , Weimin Yu , Willy Zwaenepoel, TreadMarks: Shared Memory Computing on Networks of Workstations, Computer, v.29 n.2, p.18-28, February 1996
[doi> 10.1109/2.485843]
|
 |
3
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
 |
4
|
Angelos Bilas , Cheng Liao , Jaswinder Pal Singh, Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems, Proceedings of the 26th annual international symposium on Computer architecture, p.282-293, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
5
|
|
 |
6
|
John B. Carter , John K. Bennett , Willy Zwaenepoel, Implementation and performance of Munin, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.152-164, October 13-16, 1991, Pacific Grove, California, United States
|
 |
7
|
|
| |
8
|
Derek Chiou , Boon Seong Ang , Robert Greiner , Arvind , James C. Hoe , Michael J. Beckerle , James E. Hicks , G. Andrew Boughton, START-NG: Delivering Seamless Parallel Computing, Proceedings of the First International Euro-Par Conference on Parallel Processing, p.101-116, August 29-31, 1995
|
| |
9
|
|
 |
10
|
Mark Horowitz , Margaret Martonosi , Todd C. Mowry , Michael D. Smith, Informing memory operations: providing memory performance feedback in modern processors, Proceedings of the 23rd annual international symposium on Computer architecture, p.260-270, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
11
|
InfiniBand Trade Association, InfiniBand Architecture Specification, Release 1.2, October 2004. Available from http://www.infinibandta.org.
|
| |
12
|
|
| |
13
|
|
| |
14
|
K. Krewell. Sun's Niagara Begins CMT Flood: The Sun UltraSPARC T1 Processor Released. In Microprocessor Report, January 2006.
|
 |
15
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, Proceedings of the 21ST annual international symposium on Computer architecture, p.302-313, April 18-21, 1994, Chicago, Illinois, United States
|
| |
16
|
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):690--691, September 1979.
|
 |
17
|
|
 |
18
|
|
| |
19
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
| |
20
|
A. Nowatzyk et al. The S3.mp Scalable Shared Memory Multiprocessor. In ICPP'95, volume I, pages 1--10, August 1995.
|
| |
21
|
|
 |
22
|
Kunle Olukotun , Basem A. Nayfeh , Lance Hammond , Ken Wilson , Kunyung Chang, The case for a single-chip multiprocessor, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.2-11, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
23
|
|
 |
24
|
|
 |
25
|
Steven K. Reinhardt , Robert W. Pfile , David A. Wood, Decoupled hardware support for distributed shared memory, Proceedings of the 23rd annual international symposium on Computer architecture, p.34-43, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
26
|
Daniel J. Scales , Kourosh Gharachorloo , Chandramohan A. Thekkath, Shasta: a low overhead, software-only approach for supporting fine-grain shared memory, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.174-185, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
27
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.297-306, October 05-07, 1994, San Jose, California, United States
|
| |
28
|
|
| |
29
|
|
 |
30
|
Robert Stets , Sandhya Dwarkadas , Nikolaos Hardavellas , Galen Hunt , Leonidas Kontothanassis , Srinivasan Parthasarathy , Michael Scott, Cashmere-2L: software coherent shared memory on a clustered remote-write network, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.170-183, October 05-08, 1997, Saint Malo, France
|
| |
31
|
|
| |
32
|
D. Wallin et al. Vasa: A Simulator Infrastructure with Adjustable Fidelity. In PDCS 2005, November 2005.
|
| |
33
|
D. L. Weaver and T. Germond, editors. The SPARC Architecture Manual, Version 9. PTR Prentice Hall, 2000.
|
 |
34
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
35
|
H. Zeffer et al. Exploiting Spatial Store Locality through Permission Caching in Software DSMs. In Euro-Par 2004, pages 551--560, August 2004.
|
| |
36
|
|
|