|
ABSTRACT
A multiprocessor's memory consistency model imposes ordering constraints among loads, stores, atomic operations, and memory fences. Even for consistency models that relax ordering among loads and stores, ordering constraints still induce significant performance penalties due to atomic operations and memory ordering fences. Several prior proposals reduce the performance penalty of strongly ordered models using post-retirement speculation, but these designs either (1) maintain speculative state at a per-store granularity, causing storage requirements to grow proportionally to speculation depth, or (2) employ distributed global commit arbitration using unconventional chunk-based invalidation mechanisms. In this paper we propose InvisiFence, an approach for implementing memory ordering based on post-retirement speculation that avoids these concerns. InvisiFence leverages minimalistic mechanisms for post-retirement speculation proposed in other contexts to (1) track speculative state efficiently at block-granularity with dedicated storage requirements independent of speculation depth, (2) provide fast commit by avoiding explicit commit arbitration, and (3) operate under a conventional invalidation-based cache coherence protocol. InvisiFence supports both modes of operation found in prior work: speculating only when necessary to minimize the risk of rollback-inducing violations or speculating continuously to decouple consistency enforcement from the processor core. Overall, InvisiFence requires approximately one kilobyte of additional state to transform a conventional multiprocessor into one that provides performance-transparent memory ordering, fences, and atomic operations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
Luis Ceze , James Tuck , Pablo Montesinos , Josep Torrellas, BulkSC: bulk enforcement of sequential consistency, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
6
|
|
| |
7
|
Hassan Chafi , Jared Casper , Brian D. Carlstrom , Austen McDonald , Chi Cao Minh , Woongki Baek , Christos Kozyrakis , Kunle Olukotun, A Scalable, Non-blocking Approach to Transactional Memory, Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, p.97-108, February 10-14, 2007
[doi> 10.1109/HPCA.2007.346189]
|
| |
8
|
|
 |
9
|
|
| |
10
|
M. Galluzzi, E. Vallejo, A. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J. E. Smith, and M. Valero. Implicit Transactional Memory in Kilo-Instruction Multiprocessors. In Asia-Pacific Computer Systems Architecture Conference, pages 339--353, 2007.
|
| |
11
|
María Jesús Garzarán , Milos Prvulovic , José María Llabería , Víctor Viñals , Lawrence Rauchwerger , Josep Torrellas, Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors, Proceedings of the 9th International Symposium on High-Performance Computer Architecture, p.191, February 08-12, 2003
|
 |
12
|
Kourosh Gharachorloo , Anoop Gupta , John Hennessy, Performance evaluation of memory consistency models for shared-memory multiprocessors, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.245-257, April 08-11, 1991, Santa Clara, California, United States
|
| |
13
|
K. Gharachorloo, A. Gupta, and J. Hennessy. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the International Conference on Parallel Processing, volume I, pages 355--364, Aug. 1991.
|
| |
14
|
|
 |
15
|
Chris Gniady , Babak Falsafi , T. N. Vijaykumar, Is SC + ILP = RC?, Proceedings of the 26th annual international symposium on Computer architecture, p.162-171, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
16
|
|
 |
17
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
18
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
 |
19
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, Proceedings of the 31st annual international symposium on Computer architecture, p.102, June 19-23, 2004, München, Germany
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
 |
24
|
|
 |
25
|
Naveen Neelakantam , Ravi Rajwar , Suresh Srinivas , Uma Srinivasan , Craig Zilles, Hardware atomicity for reliable software speculation, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
26
|
Vijay S. Pai , Parthasarathy Ranganathan , Sarita V. Adve , Tracy Harton, An evaluation of memory consistency models for shared-memory systems with ILP processors, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.12-23, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
27
|
|
 |
28
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
[doi> 10.1145/258492.258512]
|
 |
29
|
|
| |
30
|
J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal of Research and Development, 46(1), 2002.
|
| |
31
|
O. Trachsel, C. von Praun, and T. R. Gross. On the Effectiveness of Speculative and Selective Memory Fences. In Proceedings of the International Parallel and Distributed Processing Symposium Symposium, Apr. 2006.
|
 |
32
|
|
 |
33
|
Thomas F. Wenisch , Anastasia Ailamaki , Babak Falsafi , Andreas Moshovos, Mechanisms for store-wait-free multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
34
|
Thomas F. Wenisch , Roland E. Wunderlich , Michael Ferdman , Anastassia Ailamaki , Babak Falsafi , James C. Hoe, SimFlex: Statistical Sampling of Computer System Simulation, IEEE Micro, v.26 n.4, p.18-31, July 2006
[doi> 10.1109/MM.2006.79]
|
| |
35
|
|
|