|
ABSTRACT
We believe the paucity of massively parallel, shared-memory machines follows from the lack of a shared-memory programming performance model that can inform programmers of the cost of operations (so they can avoid expensive ones) and can tell hardware designers which cases are common (so they can build simple hardware to optimize them). Cooperative shared memory, our approach to shared-memory design, addresses this problem.
Our initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW. In CICO, programs bracket uses of shared data with a check_in directive terminating the expected use of the data. A cooperative prefetch directive helps hide communication latency. Dir1SW is a minimal directory protocol that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Sarita V. Adve , Vikram S. Adve , Mark D. Hill , Mary K. Vernon, Comparison of hardware and software cache coherence schemes, Proceedings of the 18th annual international symposium on Computer architecture, p.298-308, May 27-30, 1991, Toronto, Ontario, Canada
|
 |
2
|
A. Agarwal , R. Simoni , J. Hennessy , M. Horowitz, An evaluation of directory schemes for cache coherence, Proceedings of the 15th Annual International Symposium on Computer architecture, p.280-298, May 30-June 02, 1988, Honolulu, Hawaii, United States
|
 |
3
|
|
| |
4
|
BAYLUR, S J., MCAULIFFE, K. P., AND RATHI, B, D, 1991. An evaluation of cache coherence protocols for MIN-based multiprocessors In International Symposium on Shared Memory Multiprocessing. 230-241.
|
| |
5
|
BELL, C G. 1985 Multis A new class of multiprocessor computers. Science 228, 462-466
|
 |
6
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
 |
7
|
M. D. Canon , D. H. Fritz , J. H. Howard , T. D. Howell , M. F. Mitoma , J. Rodriquez-Rosell, A virtual machine emulator for performance evaluation, Communications of the ACM, v.23 n.2, p.71-80, Feb. 1980
[doi> 10.1145/358818.358821]
|
| |
8
|
|
 |
9
|
David Chaiken , John Kubiatowicz , Anant Agarwal, LimitLESS directories: A scalable cache coherence scheme, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.224-234, April 08-11, 1991, Santa Clara, California, United States
|
 |
10
|
|
| |
11
|
|
| |
12
|
CHERITON, D. R., GOOSEN, H. A., AND MACHANICK, P. 1991b. Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: A first experience. In International Symposium on Shared Memory Multiprocessing. 109-118.
|
| |
13
|
CYTRON, R., KARLOVSKV, S., AND MCAULIFFE, K. P. 1988. Automatic management of programmable caches. In Proceedings of the 1988 International Conference on Parallel Processing (Vol. II Software). Penn State University, 229-238.
|
 |
14
|
James R. Goodman , Mary K. Vernon , Philip J. Woest, Efficient synchronization primitives for large-scale cache-coherent multiprocessors, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.64-75, April 03-06, 1989, Boston, Massachusetts, United States
|
 |
15
|
Anoop Gupta , John Hennessy , Kourosh Gharachorloo , Todd Mowry , Wolf-Dietrich Weber, Comparative evaluation of latency reducing and tolerating techniques, Proceedings of the 18th annual international symposium on Computer architecture, p.254-263, May 27-30, 1991, Toronto, Ontario, Canada
|
| |
16
|
|
| |
17
|
GUSTAVSON, D. B., AND JAMES, D. V., Ens. 1991. SCI: Scalable Coherent Interface: Logical, Physical and Cache coherence Specifications. Vol. P1596/D2.00 18 Nov. 91. Draft 2.00 for Recirculation to the Balloting Body. IEEE, New York.
|
| |
18
|
|
 |
19
|
|
 |
20
|
Mark D. Hill , James R. Larus , Steven K. Reinhardt , David A. Wood, Cooperative shared memory: software and hardware for scalable multiprocessor, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.262-273, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
21
|
|
 |
22
|
R. H. Katz , S. J. Eggers , D. A. Wood , C. L. Perkins , R. G. Sheldon, Implementing a cache consistency protocol, Proceedings of the 12th annual international symposium on Computer architecture, p.276-283, June 17-19, 1985, Boston, Massachusetts, United States
|
| |
23
|
LARUS, J. R., CHANDRA, S., AND WOOD, D. A. 1993. CICO: A shared-memory programming performance model. In Portability and Performance for Parallel Processing. Wiley, Sussex, England.
|
| |
24
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
| |
25
|
D. Lenoski , J. Laudon , T. Joe , D. Nakahira , L. Stevens , A. Gupta , J. Hennessy, The DASH Prototype: Logic Overhead and Performance, IEEE Transactions on Parallel and Distributed Systems, v.4 n.1, p.41-61, January 1993
[doi> 10.1109/71.205652]
|
| |
26
|
LIN, C., AND SNYDER, L. 1990. A comparison of programming models for shared memory multiprocessors. In Proceedings of the 1990 International Conference on Parallel Processing (Vol. H Software). Penn State University, 11-163-170.
|
 |
27
|
|
| |
28
|
MIN, S. L., AND BAER, J.-L. 1989. A timestarnp-based cache coherence scheme. In Proceedings of the 1989 International Conference on Parallel Processing (Vol. I Architecture). Penn State University, I-23-32.
|
 |
29
|
Steven K. Reinhardt , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , David A. Wood, The Wisconsin Wind Tunnel: virtual prototyping of parallel computers, Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.48-60, May 10-14, 1993, Santa Clara, California, United States
|
 |
30
|
|
 |
31
|
|
 |
32
|
David A. Wood , Satish Chandra , Babak Falsafi , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , Shubhendu S. Mukherjee , Subbarao Palacharla , Steven K. Reinhardt, Mechanisms for cooperative shared memory, Proceedings of the 20th annual international symposium on Computer architecture, p.156-167, May 16-19, 1993, San Diego, California, United States
|
| |
33
|
|
CITED BY 32
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 conference on Supercomputing, p.380-389, December 1994, Washington, D.C., United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 ACM/IEEE conference on Supercomputing, November 14-18, 1994, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonidas Kontothanassis , Robert Stets , Galen Hunt , Umit Rencuzogullari , Gautam Altekar , Sandhya Dwarkadas , Michael L. Scott, Shared memory computing on clusters with symmetric multiprocessors and system area networks, ACM Transactions on Computer Systems (TOCS), v.23 n.3, p.301-335, August 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|