ACM Home Page
Please provide us with feedback. Feedback
Is SC + ILP = RC?
Full text PdfPdf (95 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 26th annual international symposium on Computer architecture table of contents
Atlanta, Georgia, United States
Pages: 162 - 171  
Year of Publication: 1999
ISBN:0-7695-0170-2
Also published in ...
Authors
Chris Gniady  School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN
Babak Falsafi  School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN
T. N. Vijaykumar  School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN
Sponsors
IEEE-CS\TCCA : TC on Computer Arhitecture
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 49,   Citation Count: 32
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/300979.300993
What is a DOI?

ABSTRACT

Sequential consistency (SC) is the simplest programming interface for shared-memory systems but imposes program order among all memory operations, possibly precluding high performance implementations. Release consistency (RC), however, enables the highest performance implementations but puts the burden on the programmer to specify which memory operations need to be atomic and in program order. This paper shows, for the first time, that SC implementations can perform as well as RC implementations if the hardware provides enough support for speculation. Both SC and RC implementations rely on reordering and overlapping memory operations for high performance. To enforce order when necessary, an RC implementation uses software guarantees, whereas an SC implementation relies on hardware speculation. Our SC implementation, called SC++, closes the performance gap because: (1) the hardware allows not just loads, as some current SC implementations do, but also stores to bypass each other speculatively to hide remote latencies, (2) the hardware provides large speculative state for not just processor, as previously proposed, but also memory to allow out-of-order memory operations, (3) the support for hardware speculation does not add excessive overheads to processor pipeline critical paths, and (4) well-behaved applications incur infrequent rollbacks of speculative execution. Using simulation, we show that SC++ achieves an RC implementation's performance in all the six applications we studied.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
4
 
5
Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing (Vol. I Architecture), pages 1-355-364, August 1991.
6
 
7
8
 
9
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690-691, September 1979.
 
10
 
11
12
13
14
 
15
16
 
17

CITED BY  32

Collaborative Colleagues:
Chris Gniady: colleagues
Babak Falsafi: colleagues
T. N. Vijaykumar: colleagues