| Is SC + ILP = RC? |
| Full text |
Pdf
(95 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 26th annual international symposium on Computer architecture
table of contents
Atlanta, Georgia, United States
Pages: 162 - 171
Year of Publication: 1999
ISBN:0-7695-0170-2
Also published in ...
|
|
Authors
|
|
Chris Gniady
|
School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN
|
|
Babak Falsafi
|
School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN
|
|
T. N. Vijaykumar
|
School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 49, Citation Count: 32
|
|
|
ABSTRACT
Sequential consistency (SC) is the simplest programming interface for shared-memory systems but imposes program order among all memory operations, possibly precluding high performance implementations. Release consistency (RC), however, enables the highest performance implementations but puts the burden on the programmer to specify which memory operations need to be atomic and in program order. This paper shows, for the first time, that SC implementations can perform as well as RC implementations if the hardware provides enough support for speculation. Both SC and RC implementations rely on reordering and overlapping memory operations for high performance. To enforce order when necessary, an RC implementation uses software guarantees, whereas an SC implementation relies on hardware speculation. Our SC implementation, called SC++, closes the performance gap because: (1) the hardware allows not just loads, as some current SC implementations do, but also stores to bypass each other speculatively to hide remote latencies, (2) the hardware provides large speculative state for not just processor, as previously proposed, but also memory to allow out-of-order memory operations, (3) the support for hardware speculation does not add excessive overheads to processor pipeline critical paths, and (4) well-behaved applications incur infrequent rollbacks of speculative execution. Using simulation, we show that SC++ achieves an RC implementation's performance in all the six applications we studied.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing (Vol. I Architecture), pages 1-355-364, August 1991.
|
 |
6
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
| |
7
|
|
 |
8
|
Alain Kägi , Nagi Aboulenein , Douglas C. Burger , James R. Goodman, Techniques for reducing overheads of shared-memory multiprocessing, Proceedings of the 9th international conference on Supercomputing, p.11-20, July 03-07, 1995, Barcelona, Spain
[doi> 10.1145/224538.224540]
|
| |
9
|
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690-691, September 1979.
|
| |
10
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
| |
11
|
|
 |
12
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
13
|
Parthasarathy Ranganathan , Vijay S. Pai , Hazim Abdel-Shafi , Sarita V. Adve, The interaction of software prefetching with ILP processors in shared-memory systems, Proceedings of the 24th annual international symposium on Computer architecture, p.144-156, June 01-04, 1997, Denver, Colorado, United States
|
 |
14
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
[doi> 10.1145/258492.258512]
|
| |
15
|
|
 |
16
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
17
|
|
CITED BY 32
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marco Galluzzi , Valentín Puente , Adrián Cristal , Ramón Beivide , José-Ángel Gregorio , Mateo Valero, A first glance at Kilo-instruction based multiprocessors, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
Jared C. Smolens , Brian T. Gold , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth, IEEE Micro, v.24 n.6, p.22-29, November 2004
|
|
|
Jared C. Smolens , Brian T. Gold , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth, IEEE Micro, v.24 n.6, p.22-29, November 2004
|
|
|
Zehra Sura , Xing Fang , Chi-Leung Wong , Samuel P. Midkiff , Jaejin Lee , David Padua, Compiler techniques for high performance sequentially consistent java programs, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Smruti R. Sarangi , Wei Liu, Josep Torrellas , Yuanyuan Zhou, ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.257-270, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas F. Wenisch , Stephen Somogyi , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Temporal Streaming of Shared Memory, ACM SIGARCH Computer Architecture News, v.33 n.2, p.222-233, May 2005
|
|
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, ACM SIGARCH Computer Architecture News, v.32 n.2, p.102, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|