|
ABSTRACT
This paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multi-threaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access to shared data. To ensure correctness programmers often rely on conservative locking at the expense of performance. The resulting serialization of threads is a performance bottleneck. Locks also interact poorly with thread scheduling and faults, resulting in poor system performance.We seek to improve multithreaded programming trade-offs by providing architectural support for optimistic lock-free execution. In a lock-free execution, shared objects are never locked when accessed by various threads. We propose Transactional Lock Removal (TLR) and show how a program that uses lock-based synchronization can be executed by the hardware in a lock-free manner, even in the presence of conflicts, without programmer support or software changes. TLR uses timestamps for conflict resolution, modest hardware, and features already present in many modern computer systems.TLR's benefits include improved programmability, stability, and performance. Programmers can obtain benefits of lock-free data structures, such as non-blocking behavior and wait-freedom, while using lock-protected critical sections for writing programs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. R. Alameldeen, C. J. Mauer, M. Xu, P. J. Harper, M. M. Martin, D. J. Sorin, M. D. Hill, and D. A. Wood. Evaluating non-deterministic multi-threaded commercial workloads. In Fifth Workshop on Computer Architecture Evaluation Using Commercial Workloads, pages 30-38, Feb. 2002.
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
B. N. Bershad. Practical considerations for lock-free concurrent objects. Technical Report CMU-CS-91-183, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Sept. 1991.
|
 |
7
|
|
| |
8
|
K. Gharachorloo, A. Gupta, and J. L. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, pages 355-364, Aug. 1991.
|
 |
9
|
James R. Goodman , Mary K. Vernon , Philip J. Woest, Efficient synchronization primitives for large-scale cache-coherent multiprocessors, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.64-75, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
10
|
J. Gray. The transaction concept: Virtues and limitations. In Seventh International Conference on Very Large Data Bases, 144-154, Sept. 1981.
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
E. H. Jensen, G. W. Hagensen, and J. M. Broughton. A new approach to exclusive data access in shared memory multiprocessors. Technical Report UCRL-97663, Lawrence Livermore National Laboratory, Livermore, CA, Nov. 1987.
|
 |
15
|
|
 |
16
|
Alain Kägi , Doug Burger , James R. Goodman, Efficient synchronization: let them eat QOLB, Proceedings of the 24th annual international symposium on Computer architecture, p.170-180, June 01-04, 1997, Denver, Colorado, United States
|
| |
17
|
|
 |
18
|
|
 |
19
|
Sanjeev Kumar , Dongming Jiang , Rohit Chandra , Jaswinder Pal Singh, Evaluating synchronization on shared address space multiprocessors: methodology and performance, Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.23-34, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
J. F. Martínez and J. Torrellas. Speculative locks for concurrent execution of critical sections in shared-memory multiprocessors. In Workshop on Memory Performance Issues, June 2001.
|
 |
25
|
|
 |
26
|
|
 |
27
|
Maged M. Michael , Michael L. Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms, Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, p.267-275, May 23-26, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/248052.248106]
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
R. Rajwar, A. Kägi, and J. R. Goodman. Improving the throughput of synchronization by insertion of delays. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, pages 168-179, Jan. 2000.
|
 |
32
|
|
 |
33
|
|
 |
34
|
|
| |
35
|
A. Singhal, D. Broniarczyk, F. M. Cerauskis, J. Price, L. Yuan, G. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvey, and E. Hagersten. Gigaplane: A high performance bus for large SMPs. In Proceedings of the Symposium on High Performance Interconnects IV, pages 41-52, Aug. 1996.
|
| |
36
|
|
 |
37
|
John Turek , Dennis Shasha , Sundeep Prakash, Locking without blocking: making lock based concurrent data structure algorithms nonblocking, Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.212-222, June 02-05, 1992, San Diego, California, United States
[doi> 10.1145/137097.137873]
|
| |
38
|
|
 |
39
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
CITED BY 49
|
|
|
|
|
|
|
|
Sanjeev Kumar , Michael Chu , Christopher J. Hughes , Partha Kundu , Anthony Nguyen, Hybrid transactional memory, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
Cliff Jones , David Lomet , Alexander Romanovsky , Gerhard Weikum , Alan Fekete , Marie-Claude Gaudel , Henry F. Korth , Rogerio de Lemos , Eliot Moss , Ravi Rajwar , Krithi Ramamritham , Brian Randell , Luis Rodrigues, The atomic manifesto: a story in four quarks, ACM SIGOPS Operating Systems Review, v.39 n.2, p.41-46, April 2005
|
|
|
Cliff Jones , David Lomet , Alexander Romanovsky , Gerhard Weikum , Alan Fekete , Marie-Claude Gaudel , Henry F. Korth , Rogerio de Lemos , Eliot Moss , Ravi Rajwar , Krithi Ramamritham , Brian Randell , Luis Rodrigues, The atomic manifesto: a story in four quarks, ACM SIGMOD Record, v.34 n.1, March 2005
|
|
|
|
|
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Ben Hertzberg , Mike Chen , Christos Kozyrakis , Kunle Olukotun, Programming with transactional coherence and consistency (TCC), ACM SIGOPS Operating Systems Review, v.38 n.5, December 2004
|
|
|
Lance Hammond , Brian D. Carlstrom , Vicky Wong , Michael Chen , Christos Kozyrakis , Kunle Olukotun, Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software, IEEE Micro, v.24 n.6, p.92-103, November 2004
|
|
|
|
|
|
Michael A. Bender , Martin Farach-Colton , Simai He , Bradley C. Kuszmaul , Charles E. Leiserson, Adversarial contention resolution for simple channels, Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, July 18-20, 2005, Las Vegas, Nevada, USA
|
|
|
Tim Harris , Simon Marlow , Simon Peyton-Jones , Maurice Herlihy, Composable memory transactions, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
JaeWoong Chung , Chi Cao Minh , Austen McDonald , Travis Skare , Hassan Chafi , Brian D. Carlstrom , Christos Kozyrakis , Kunle Olukotun, Tradeoffs in transactional memory virtualization, ACM SIGPLAN Notices, v.41 n.11, November 2006
|
|
|
|
|
|
|
|
|
Brian D. Carlstrom , JaeWoong Chung , Hassan Chafi , Austen McDonald , Chi Cao Minh , Lance Hammond , Christos Kozyrakis , Kunle Olukotun, Executing Java programs with transactional memory, Science of Computer Programming, v.63 n.2, p.111-129, 1 December 2006
|
|
|
|
|
|
Yao Guo , Vladimir Vlassov , Raksit Ashok , Richard Weiss , Csaba Andras Moritz, Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization, Journal of Parallel and Distributed Computing, v.68 n.2, p.165-181, February, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jayaram Bobba , Kevin E. Moore , Haris Volos , Luke Yen , Mark D. Hill , Michael M. Swift , David A. Wood, Performance pathologies in hardware transactional memory, ACM SIGARCH Computer Architecture News, v.35 n.2, May 2007
|
|
|
Austen McDonald , JaeWoong Chung , Brian D. Carlstrom , Chi Cao Minh , Hassan Chafi , Christos Kozyrakis , Kunle Olukotun, Architectural Semantics for Practical Transactional Memory, ACM SIGARCH Computer Architecture News, v.34 n.2, p.53-65, May 2006
|
|
|
|
|
|
Lance Hammond , Vicky Wong , Mike Chen , Brian D. Carlstrom , John D. Davis , Ben Hertzberg , Manohar K. Prabhu , Honggo Wijaya , Christos Kozyrakis , Kunle Olukotun, Transactional Memory Coherence and Consistency, ACM SIGARCH Computer Architecture News, v.32 n.2, p.102, March 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Seth H. Pugsley , Manu Awasthi , Niti Madan , Naveen Muralimanohar , Rajeev Balasubramonian, Scalable and reliable communication for hardware transactional memory, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|