|
ABSTRACT
While Sequential Consistency (SC) is the most intuitive memory consistency model and the one most programmers likely assume, current multiprocessors do not support it. Instead, they support more relaxed models that deliver high performance. SC implementations are considered either too slow or -- when they can match the performance of relaxed models -- too difficult to implement. In this paper, we propose Bulk Enforcement of SC (BulkSC), anovel way of providing SC that is simple to implement and offers performance comparable to Release Consistency (RC). The idea is to dynamically group sets of consecutive instructions into chunks that appear to execute atomically and in isolation. The hardware enforces SC at the coarse grain of chunks which, to the program, appears as providing SC at the individual memory access level. BulkSC keeps the implementation simple by largely decoupling memory consistency enforcement from processor structures. Moreover, it delivers high performance by enabling full memory access reordering and overlapping within chunks and across chunks. We describe a complete system architecture that supports BulkSC and show that it delivers performance comparable to RC.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. V. Adve and K. Gharachorloo, "Shared Memory Consistency Models: A Tutorial," Western Reseach Laboratory-Compaq. Research Report 95/7, September 1995.
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. Cao Minh, W. Baek, C. Kozyrakis, and K. Olukotun, "A Scalable, Non-blocking Approach to Transactional Memory," in Inter. Symp. on High Performance Computer Architecture, February 2007.
|
| |
10
|
|
 |
11
|
|
| |
12
|
K. Gharachorloo, A. Gupta, and J. L. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models.," in Inter. Conf. on Parallel Processing, August 1991.
|
 |
13
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
| |
14
|
|
 |
15
|
Chris Gniady , Babak Falsafi , T. N. Vijaykumar, Is SC + ILP = RC?, Proceedings of the 26th annual international symposium on Computer architecture, p.162-171, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
16
|
A. Gupta, W. Weber, and T. Mowry, "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes," in Inter. Conference on Parallel Processing, August 1990.
|
 |
17
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, Proceedings of the 31st annual international symposium on Computer architecture, p.102, June 19-23, 2004, München, Germany
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
L. Lamport, "How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs," IEEE Tran. on Comp., July 1979.
|
| |
22
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
| |
23
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
24
|
|
 |
25
|
Vijay S. Pai , Parthasarathy Ranganathan , Sarita V. Adve , Tracy Harton, An evaluation of memory consistency models for shared-memory systems with ILP processors, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.12-23, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
26
|
|
 |
27
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
[doi> 10.1145/258492.258512]
|
| |
28
|
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos, "SESC Simulator," January 2005. http://sesc.sourceforge.net.
|
 |
29
|
Srikanth T. Srinivasan , Ravi Rajwar , Haitham Akkary , Amit Gandhi , Mike Upton, Continual flow pipelines, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
30
|
Thomas F. Wenisch , Anastasia Ailamaki , Babak Falsafi , Andreas Moshovos, Mechanisms for store-wait-free multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
31
|
|
CITED BY 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Seth H. Pugsley , Manu Awasthi , Niti Madan , Naveen Muralimanohar , Rajeev Balasubramonian, Scalable and reliable communication for hardware transactional memory, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|