ACM Home Page
Please provide us with feedback. Feedback
Mechanisms for store-wait-free multiprocessors
Full text PdfPdf (535 KB)
Source
International Symposium on Computer Architecture archive
Proceedings of the 34th annual international symposium on Computer architecture table of contents
San Diego, California, USA
SESSION: Memory consistency table of contents
Pages: 266 - 277  
Year of Publication: 2007
ISBN:978-1-59593-706-3
Also published in ...
Authors
Thomas F. Wenisch  Carnegie Mellon University, Pittsburgh, PA
Anastasia Ailamaki  Carnegie Mellon University, Pittsburgh, PA
Babak Falsafi  Carnegie Mellon University, Pittsburgh, PA
Andreas Moshovos  University of Toronto, Toronto, ON, Canada
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS : Computer Society
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 48,   Downloads (12 Months): 183,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1250662.1250696
What is a DOI?

ABSTRACT

Store misses cause significant delays in shared-memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization. Today, programmers must choose from a spectrum of memory consistency models that reduce store stalls at the cost of increased programming complexity. Prior research suggests that the performance gap among consistency models can be closed through speculation--enforcing order only when dynamically necessary. Unfortunately, past designs either provide insufficient buffering, replace all stores with read-modify-write operations, and/or recover from ordering violations via impractical fine-grained rollback mechanisms.

We propose two mechanisms that, together, enable store-wait-free implementations of any memory consistency model. To eliminate buffer-capacity-related stalls, we propose the scalable store buffer, which places private/speculative values directly into the L1 cache, thereby eliminating the non-scalable associative search of conventional store buffers. To eliminate ordering-related stalls, we propose atomic sequence ordering, which enforces ordering constraints over coarse-grain access sequences while relaxing order among individual accesses. Using cycle-accurate full-system simulation of scientific and commercial applications, we demonstrate that these mechanisms allow the simplified programming of strict ordering while outperforming conventional implementations on average by 32% (sequential consistency), 22% (SPARC total store order) and 9% (SPARC relaxed memory order).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
5
6
 
7
 
8
 
9
A. Gandhi, H. Akkary, R. Rajwar, S. T. Srinivasan, and K. Lai. Scalable load and store processing in latency tolerant processors. Proc. of the 38th Int'l Symposium on Microarchitecture, Dec. 2005.
 
10
K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. Proc. of the Int'l Conference on Parallel Processing, Aug. 1991.
11
 
12
13
14
 
15
M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. Technical Report 92/07, Digital Equipment Corporation, Cambridge Research Laboratory, Dec. 1992.
 
16
 
17
 
18
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, Sep. 1979.
 
19
J. Larus and R. Rajwar. Transactional Memory. Morgan Claypool Publishers, 2006.
 
20
21
 
22
 
23
24
25
26
27
28
 
29
 
30
31
32
33
 
34
35
36
 
37
38

CITED BY  10

Collaborative Colleagues:
Thomas F. Wenisch: colleagues
Anastasia Ailamaki: colleagues
Babak Falsafi: colleagues
Andreas Moshovos: colleagues