|
ABSTRACT
The shared memory research community has proposed many complex communication protocols that aim to eliminate specific performance bottlenecks, while still providing an easy-to-use communication interface. Although tailored protocols can eliminate some bottlenecks that arise in real applications, removing the cause of the bottleneck through software optimizations and bug fixes is cheaper to implement, faster to fix (once found), and requires no additional support by the hardware beyond a simple shared memory interface. In fact, in our experience, the choice of coherence protocol is much less important than providing an efficient hardware feedback that indentifies the source of the problem. Future cache-coherence research should focus efforts on illuminating memory system behavior, providing smarter tools to identify bottlenecks, and helping to eliminate them through software optimizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Vishal Aslot , Max J. Domeika , Rudolf Eigenmann , Greg Gaertner , Wesley B. Jones , Bodo Parady, SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance, Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, p.1-10, July 30-31, 2001
|
| |
2
|
|
 |
3
|
David Chaiken , John Kubiatowicz , Anant Agarwal, LimitLESS directories: A scalable cache coherence scheme, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.224-234, April 08-11, 1991, Santa Clara, California, United States
|
 |
4
|
|
| |
5
|
|
 |
6
|
Jeff Gibson , Robert Kunz , David Ofelt , Mark Horowitz , John Hennessy , Mark Heinrich, FLASH vs. (Simulated) FLASH: closing the simulation loop, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.49-58, November 2000, Cambridge, Massachusetts, United States
|
| |
7
|
|
| |
8
|
|
| |
9
|
R. Kalla, B. Sinharoy, and J. Tendler. Simultaneous Multithreading Implementations in POWER5---IBM's Next Generation POWER Microprocessor. In Hot Chips 15, August 2003.
|
| |
10
|
S. Kapil. Gemini: A Power-efficient Chip Multi-Threaded (CMT) UltraSPARC Processor. In Hot Chips 15, August 2003.
|
| |
11
|
P. Kongetiraer. A 32-way Multithreaded SPARC processor. In Hot Chips 16, August 2004.
|
| |
12
|
R. Kunz. Performance Bottlenecks on Large-Scale Shared-Memory Multiprocessors. PhD thesis, Stanford University, 2005.
|
 |
13
|
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy, The Stanford FLASH multiprocessor, Proceedings of the 21st annual international symposium on Computer architecture, p.302-313, April 18-21, 1994, Chicago, Illinois, United States
|
 |
14
|
|
 |
15
|
|
 |
16
|
Daniel Lenoski , James Laudon , Truman Joe , David Nakahira , Luis Stevens , Anoop Gupta , John Hennessy, The DASH prototype: implementation and performance, Proceedings of the 19th annual international symposium on Computer architecture, p.92-103, May 19-21, 1992, Queensland, Australia
|
| |
17
|
Njuguna Njoroge , Jared Casper , Sewook Wee , Yuriy Teslyar , Daxia Ge , Christos Kozyrakis , Kunle Olukotun, ATLAS: a chip-multiprocessor with transactional memory support, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
| |
18
|
A. Nowatzyk et al. S3.mp: A Multiprocessor in a Matchbox. In Proceedings of PASA, June 1993.
|
 |
19
|
S. K. Reinhardt , J. R. Larus , D. A. Wood, Tempest and typhoon: user-level shared memory, Proceedings of the 21st annual international symposium on Computer architecture, p.325-336, April 18-21, 1994, Chicago, Illinois, United States
|
| |
20
|
|
 |
21
|
Daniel J. Scales , Kourosh Gharachorloo , Chandramohan A. Thekkath, Shasta: a low overhead, software-only approach for supporting fine-grain shared memory, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.174-185, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
22
|
Daniel J. Sorin , Vijay S. Pai , Sarita V. Adve , Mary K. Vernon , David A. Wood, Analytic evaluation of shared-memory systems with ILP processors, Proceedings of the 25th annual international symposium on Computer architecture, p.380-391, June 27-July 02, 1998, Barcelona, Spain
|
 |
23
|
Vijayaraghavan Soundararajan , Mark Heinrich , Ben Verghese , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors, Proceedings of the 25th annual international symposium on Computer architecture, p.342-355, June 27-July 02, 1998, Barcelona, Spain
|
|