|
ABSTRACT
Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the decades-old performance trend of doubling performance every 18 months. An attractive approach for exploiting multiple cores is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. However, despite decades of research on automatic parallelization, most techniques are only effective in the scientific and data parallel domains where array dominated codes can be precisely analyzed by the compiler. Thread-level speculation offers the opportunity to expand parallelization to general-purpose programs, but at the cost of expensive hardware support. In this paper, we focus on providing low-overhead software support for exploiting speculative parallelism. We propose STMlite, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization. STMlite eliminates a considerable amount of checking and locking overhead in conventional software transactional memory models by decoupling the commit phase from main transaction execution. Further, strong atomicity requirements for generic transactional memories are unnecessary within a stylized automatic parallelization framework. STMlite enables sequential applications to extract meaningful performance gains on commodity multicore hardware.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Ali-Reza Adl-Tabatabai , Brian T. Lewis , Vijay Menon , Brian R. Murphy , Bratin Saha , Tatiana Shpeisman, Compiler and runtime support for efficient software transactional memory, Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, June 11-14, 2006, Ottawa, Ontario, Canada
|
| |
3
|
|
| |
4
|
|
 |
5
|
Brian D. Carlstrom , Austen McDonald , Hassan Chafi , JaeWoong Chung , Chi Cao Minh , Christos Kozyrakis , Kunle Olukotun, The Atomos transactional programming language, Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, June 11-14, 2006, Ottawa, Ontario, Canada
|
 |
6
|
|
| |
7
|
|
| |
8
|
K. Cooper et al. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244--263, Feb. 1993.
|
| |
9
|
D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In Proc. of the 2006 International Symposium on Distributed Computing, 2006.
|
| |
10
|
|
 |
11
|
Zhao-Hui Du , Chu-Cheow Lim , Xiao-Feng Li , Chen Yang , Qingyu Zhao , Tin-Fook Ngai, A cost-driven compilation framework for speculative parallelization of sequential programs, Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, June 09-11, 2004, Washington DC, USA
|
| |
12
|
W. Eatherton. The push of network processing to the top of the pyramid, 2005. Keynote address: Symposium on Architectures for Networking and Communications Systems.
|
| |
13
|
|
 |
14
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
15
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
 |
16
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
 |
17
|
Tim Harris , Keir Fraser, Language support for lightweight transactions, Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, October 26-30, 2003, Anaheim, California, USA
|
 |
18
|
Tim Harris , Mark Plesko , Avraham Shinnar , David Tarditi, Optimizing memory transactions, Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, June 11-14, 2006, Ottawa, Ontario, Canada
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
J. Larus and R. Rajwar. Transactional Memroy. Morgan & Claypool Publishers, 2007.
|
| |
24
|
|
 |
25
|
Wei Liu , James Tuck , Luis Ceze , Wonsun Ahn , Karin Strauss , Jose Renau , Josep Torrellas, POSH: a TLS compiler that exploits program structure, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122997]
|
| |
26
|
V. J. Marathe, W. N. Scherer, and M. L. Scott. Adaptive software transactional memory. In Proc. of the 2005 International Symposium on Distributed Computing, pages 354--368, Sept. 2005.
|
| |
27
|
|
| |
28
|
C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of IISWC08, 2008.
|
 |
29
|
Chi Cao Minh , Martin Trautmann , JaeWoong Chung , Austen McDonald , Nathan Bronson , Jared Casper , Christos Kozyrakis , Kunle Olukotun, An effective hybrid transactional memory system with strong isolation guarantees, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
30
|
J. Nickolls and I. Buck. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007.
|
| |
31
|
E. Nystrom, H.-S. Kim, and W. Hwu. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proc. of the 11th Static Analysis Symposium, pages 165--180, Aug. 2004.
|
| |
32
|
|
 |
33
|
Florian T. Schneider , Vijay Menon , Tatiana Shpeisman , Ali-Reza Adl-Tabatabai, Dynamic optimization for efficient strong atomicity, Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, October 19-23, 2008, Nashville, TN, USA
|
 |
34
|
|
| |
35
|
N. Shavit and D. Touitou. Software transactional memory. Journal of Parallel and Distributed Computing, 10(2):99--116, Feb. 1997.
|
 |
36
|
Tatiana Shpeisman , Vijay Menon , Ali-Reza Adl-Tabatabai , Steven Balensiefer , Dan Grossman , Richard L. Hudson , Katherine F. Moore , Bratin Saha, Enforcing isolation and ordering in STM, Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, June 10-13, 2007, San Diego, California, USA
|
 |
37
|
|
| |
38
|
M. F. Spear, V. J. Marathe, W. N. S. Iii, and M. L. Scott. Conflict detection and validation strategies for software transactional memory. In Proc. of the 2006 International Symposium on Distributed Computing, 2006.
|
| |
39
|
M. F. Spear, M. M. Michael, and C. von Praun. RingSTM: scalable transactions with a single atomic instruction. pages 275--284, 2008.
|
| |
40
|
|
| |
41
|
|
| |
42
|
Neil Vachharajani , Ram Rangan , Easwaran Raman , Matthew J. Bridges , Guilherme Ottoni , David I. August, Speculative Decoupled Software Pipelining, Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, p.49-59, September 15-19, 2007
[doi> 10.1109/PACT.2007.66]
|
| |
43
|
Luke Yen , Jayaram Bobba , Michael R. Marty , Kevin E. Moore , Haris Volos , Mark D. Hill , Michael M. Swift , David A. Wood, LogTM-SE: Decoupling Hardware Transactional Memory from Caches, Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, p.261-272, February 10-14, 2007
[doi> 10.1109/HPCA.2007.346204]
|
| |
44
|
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proc. of the 14th International Symposium on High-Performance Computer Architecture, Feb. 2008.
|
| |
45
|
|
|