|
ABSTRACT
Barriers, locks, and flags are synchronizing operations widely used programmers and parallelizing compilers to produce race-free parallel programs. Often times, these operations are placed suboptimally, either because of conservative assumptions about the program, or merely for code simplicity.We propose Speculative Synchronization, which applies the philosophy behind Thread-Level Speculation (TLS) to explicitly parallel applications. Speculative threads execute past active barriers, busy locks, and unset flags instead of waiting. The proposed hardware checks for conflicting accesses and, if a violation is detected, offending speculative thread is rolled back to the synchronization point and restarted on the fly. TLS's principle of always keeping a safe thread is key to our proposal: in any speculative barrier, lock, or flag, the existence of one or more safe threads at all times guarantees forward progress, even in the presence of access conflicts or speculative buffer overflow. Our proposal requires simple hardware and no programming effort. Furthermore, it can coexist with conventional synchronization at run time.We use simulations to evaluate 5 compiler- and hand-parallelized applications. Our results show a reduction in the time lost to synchronization of 34% on average, and a reduction in overall program execution time of 7.4% on average.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
William Blume , Ramon Doallo , Rudolf Eigenmann , John Grout , Jay Hoeflinger , Thomas Lawrence , Jaejin Lee , David Padua , Yunheung Paek , Bill Pottenger , Lawrence Rauchwerger , Peng Tu, Parallel Programming with Polaris, Computer, v.29 n.12, p.78-82, December 1996
[doi> 10.1109/2.546612]
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
J. Edler, J. Lipkis, and E. Schonberg. Process management for highly parallel UNIX systems. In USENIX Workshop on Unix and Supercomputers, San Francisco, CA, Sept. 1988.
|
 |
7
|
|
| |
8
|
K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In International Conference on Parallel Processing, pages 1355-1364, St. Charles, IL, Aug. 1991.
|
 |
9
|
Chris Gniady , Babak Falsafi , T. N. Vijaykumar, Is SC + ILP = RC?, Proceedings of the 26th annual international symposium on Computer architecture, p.162-171, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
10
|
|
 |
11
|
|
 |
12
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
| |
22
|
Ewing Lusk , James Boyle , Ralph Butler , Terrence Disz , Barnett Glickfeld , Ross Overbeek , James Patterson , Rick Stevens, Portable programs for parallel processors, Holt, Rinehart & Winston, Austin, TX, 1988
|
 |
23
|
|
 |
24
|
Brian D. Marsh , Michael L. Scott , Thomas J. LeBlanc , Evangelos P. Markatos, First-class user-level threads, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.110-121, October 13-16, 1991, Pacific Grove, California, United States
|
| |
25
|
J. F. Martínez and J. Torrellas. Speculative Locks for concurrent execution of critical sections in shared-memory multiprocessors. In Workshop on Memory Performance Issues, Gothenburg, Sweden, June 2001.
|
 |
26
|
Vijay S. Pai , Parthasarathy Ranganathan , Sarita V. Adve , Tracy Harton, An evaluation of memory consistency models for shared-memory systems with ILP processors, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.12-23, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
 |
31
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
32
|
|
| |
33
|
|
 |
34
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
| |
35
|
|
CITED BY 33
|
|
|
|
|
Sanjeev Kumar , Michael Chu , Christopher J. Hughes , Partha Kundu , Anthony Nguyen, Hybrid transactional memory, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
|
|
|
|
|
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), ACM SIGOPS Operating Systems Review, v.38 n.5, December 2004
|
|
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Michael Chen , Christos Kozyrakis , Kunle Olukotun, Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software, IEEE Micro, v.24 n.6, p.92-103, November 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tim Harris , Simon Marlow , Simon Peyton-Jones , Maurice Herlihy, Composable memory transactions, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian D. Carlstrom , JaeWoong Chung , Hassan Chafi , Austen McDonald , Chi Cao Minh , Lance Hammond , Christos Kozyrakis , Kunle Olukotun, Executing Java programs with transactional memory, Science of Computer Programming, v.63 n.2, p.111-129, 1 December 2006
|
|
|
Yao Guo , Vladimir Vlassov , Raksit Ashok , Richard Weiss , Csaba Andras Moritz, Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization, Journal of Parallel and Distributed Computing, v.68 n.2, p.165-181, February, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Smruti R. Sarangi , Wei Liu, Josep Torrellas , Yuanyuan Zhou, ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.257-270, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, ACM SIGARCH Computer Architecture News, v.32 n.2, p.102, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|