|
ABSTRACT
Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.
This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Sarita Adve and Mark Hill. Personal communication. March 1990.
|
| |
2
|
Forest Basket& Tom Jermoluk, and Doug Solomon. The 4D-MP graphics superworkstation: Computing + graphics = 40 MIPS + 40 MFLOPS and 100,000 lighted polygons per second. In Proceedings of the 33rd IEEE Computer Society International Conference - COMPCON 88, pages 468471, February 1988.
|
| |
3
|
W. C. Brantley, K. P. McAuliffe, and J. Weiss. RP3 processor-memory element. In Proceedings of the 1985 International Conference on Paralle{ Processing, pages 782-789, 1985.
|
 |
4
|
|
| |
5
|
James R. Goodman. Cache consistlency and sequential consistency. Technical Report no. 61, SC1 Committee, March 1989.
|
| |
6
|
|
| |
7
|
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):241-248, September 1979.
|
 |
8
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
| |
9
|
G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe. E. A. Melton, V. A. Norton, and J. Weiss. The Il3M research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of the 1985 International Conference on Parallel Processing, pages 764-771, 1985.
|
 |
10
|
|
| |
11
|
|
| |
12
|
G. E. Schmidt. The Butterfly parallel processor. In Proceedings of the Second Znternational Conference on Supercomputing, pages 362-365, 1987.
|
 |
13
|
|
CITED BY 222
|
|
Leonidas I. Kontothanassis , Michael L. Scott , Ricardo Bianchini, Lazy release consistency for hardware-coherent multiprocessors, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.61-es, December 04-08, 1995, San Diego, California, United States
|
|
|
|
|
|
Yuanyuan Zhou , Liviu Iftode , Jaswinder Pal Sing , Kai Li , Brian R. Toonen , Ioannis Schoinas , Mark D. Hill , David A. Wood, Relaxed consistency and coherence granularity in DSM systems: a performance evaluation, ACM SIGPLAN Notices, v.32 n.7, p.193-205, July 1997
|
|
|
|
|
|
Phillip B. Gibbons , Michael Merritt , Kourosh Gharachorloo, Proving sequential consistency of high-performance shared memories (extended abstract), Proceedings of the third annual ACM symposium on Parallel algorithms and architectures, p.292-303, July 21-24, 1991, Hilton Head, South Carolina, United States
|
|
|
|
|
|
Alain Kägi , Nagi Aboulenein , Douglas C. Burger , James R. Goodman, Techniques for reducing overheads of shared-memory multiprocessing, Proceedings of the 9th international conference on Supercomputing, p.11-20, July 03-07, 1995, Barcelona, Spain
|
|
|
|
|
|
Robert D. Blumofe , Matteo Frigo , Christopher F. Joerg , Charles E. Leiserson , Keith H. Randall, An analysis of dag-consistent distributed shared-memory algorithms, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.297-308, June 24-26, 1996, Padua, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Mori , H. Saito , M. Goshima , S. Tomita , M. Yanagihara , T. Tanaka , D. Fraser , K. Joe , H. Nitta, A distributed shared memory multiprocessor ASURA: memory and cache architecture, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.740-749, December 1993, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rohit Chandra , Kourosh Gharachorloo , Vijayaraghavan Soundararajan , Anoop Gupta, Performance evaluation of hybrid hardware and software distributed shared memory protocols, Proceedings of the 8th international conference on Supercomputing, p.274-288, July 11-15, 1994, Manchester, England
|
|
|
Florin Sultan , Liviu Iftode , Thu Nguyen, Scalable fault-tolerant distributed shared memory, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.20-es, November 04-10, 2000, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
A. L. Cox , S. Dwarkadas , P. Keleher , H. Lu , R. Rajamony , W. Zwaenepoel, Software versus hardware shared-memory implementation: a case study, ACM SIGARCH Computer Architecture News, v.22 n.2, p.106-117, April 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
J. K. Bennett , J. B. Carter , A. L. Cox , E. N. Elnozahy , D. B. Johnson , P. Keleher , W. Zwaenepoel, Distributed shared memory: experience with Munin, Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring, September 21-23, 1992, Mont Saint-Michel, France
|
|
|
Nuno Neves , Miguel Castro , Paulo Guedes, A checkpoint protocol for an entry consistent shared memory system, Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing, p.121-129, August 14-17, 1994, Los Angeles, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gregory V. Chockler , Danny Dolev , Roy Friedman , Roman Vitenberg, Implementing a caching service a distributed COBRA objects, IFIP/ACM International Conference on Distributed systems platforms, p.1-23, April 03-07, 2000, New York, New York, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Divyakant Agrawal , Manhoi Choy , Hong Va Leong , Ambuj K. Singh, Mixed consistency: a model for parallel programming (extended abstract), Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing, p.101-110, August 14-17, 1994, Los Angeles, California, United States
|
|
|
|
|
|
Liviu Iftode , Matthias Blumrich , Cezary Dubnicki , David L. Oppenheimer , Jaswinder Pal Singh , Kai Li, Shared virtual memory with automatic update support, Proceedings of the 13th international conference on Supercomputing, p.175-183, June 20-25, 1999, Rhodes, Greece
|
|
|
Honghui Lu , Sandhya Dwarkadas , Alan L. Cox , Willy Zwaenepoel, Message passing versus distributed shared memory on networks of workstations, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.37-es, December 04-08, 1995, San Diego, California, United States
|
|
|
|
|
|
|
|
|
J. K. Bennett , S. Dwarkadas , J. Greenwood , E. Speight, Willow: a scalable shared memory multiprocessor, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.336-345, November 16-20, 1992, Minneapolis, Minnesota, United States
|
|
|
|
|
|
|
|
|
Manoj Plakal , Daniel J. Sorin , Anne E. Condon , Mark D. Hill, Lamport clocks: verifying a directory cache-coherence protocol, Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, p.67-76, June 28-July 02, 1998, Puerto Vallarta, Mexico
|
|
|
Aman Singla , Umakishore Ramachandran , Jessica Hodgins, Temporal notions of synchronization and consistency in Beehive, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.211-220, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
Mustaque Ahamad , Rida A. Bazzi , Ranjit John , Prince Kohli , Gil Neiger, The power of processor consistency, Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures, p.251-260, June 30-July 02, 1993, Velen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Seungjoon Park , David L. Dill, An executable specification, analyzer and verifier for RMO (relaxed memory order), Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.34-41, June 24-26, 1995, Santa Barbara, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mark D. Hill , Anne E. Condon , Manoj Plakal , Daniel J. Sorin, A system-level specification framework for I/O architectures, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.138-147, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Umakishore Ramachandran , Gautam Shah , Anand Sivasubramaniam , Aman Singla , Ivan Yanasak, Architectural mechanisms for explicit communication in shared memory multiprocessors, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.62-es, December 04-08, 1995, San Diego, California, United States
|
|
|
Hagit Attiya , Soma Chaudhuri , Roy Friedman , Jennifer L. Welch, Shared memory consistency conditions for non-sequential execution: definitions and programming strategies, Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures, p.241-250, June 30-July 02, 1993, Velen, Germany
|
|
|
|
|
|
R. Veldema , R. F. H. Hofman , R. A. F. Bhoedjang , H. E. Bal, Runtime optimizations for a Java DSM implementation, Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande, p.153-162, June 2001, Palo Alto, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
|
|
|
K. Farkas , Z. Vranesic , M. Stumm, Cache consistency in hierarchical-ring-based multiprocessors, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.348-357, November 16-20, 1992, Minneapolis, Minnesota, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jack Dongarra , Ian Foster , Geoffrey Fox , William Gropp , Ken Kennedy , Linda Torczon , Andy White, References, Sourcebook of parallel computing, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2003
|
|
|
|
|
|
Mark W. MacBeth , Keith A. McGuigan , Philip J. Hatcher, Executing Java threads in parallel in a distributed-memory environment, Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research, p.16, November 30-December 03, 1998, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. A. Koufaty , X. Chen , D. K. Poulsen , J. Torrellas, Data forwarding in scalable shared-memory multiprocessors, Proceedings of the 9th international conference on Supercomputing, p.255-264, July 03-07, 1995, Barcelona, Spain
|
|
|
|
|
|
|
|
|
D. Lenoski , J. Laudon , T. Joe , D. Nakahira , L. Stevens , A. Gupta , J. Hennessy, The DASH Prototype: Logic Overhead and Performance, IEEE Transactions on Parallel and Distributed Systems, v.4 n.1, p.41-61, January 1993
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shin-ichiro Mori , Masahiro Goshima , Hiroshi Nakashima , Shinji Tomita, A proposal of self-cleanup cache, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.298-301, June 27-29, 1995, Limassol, Cyprus
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Legond L. Burge, III , Mitchell L. Neilsen, A decentralized communication efficient distributed shared memory, Proceedings of the 1996 ACM symposium on Applied Computing, p.358-365, February 17-19, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vijay A. Saraswat , Radha Jagadeesan , Maged Michael , Christoph von Praun, A theory of memory models, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
|
|
|
Satish Chandra , Michael Dahlin , Bradley Richards , Randolph Y. Wang , Thomas E. Anderson , James R. Larus, Experience with a language for writing coherence protocols, Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997, p.5-5, October 15-17, 1997, Santa Barbara, California
|
|
|
|
|
|
|
|
|
|
|
|
Arun Kejariwal , Hideki Saito , Xinmin Tian , Milind Girkar , Wel Li , Utpal Banerjee , Alexandru Nicolau , Constantine D. Polychronopoulos, Lightweight lock-free synchronization methods for multithreading, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael J. Feeley , Jeffrey S. Chase , Vivek R. Narasayya , Henry M. Levy, Integrating coherency and recoverability in distributed systems, Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation, p.16-es, November 14-17, 1994, Monterey, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, ACM SIGARCH Computer Architecture News, v.32 n.2, p.102, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jaejin Lee , Sangmin Seo , Chihun Kim , Junghyun Kim , Posung Chun , Zehra Sura , Jungwon Kim , SangYong Han, COMIC: a coherent shared memory interface for cell be, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|