|
ABSTRACT
Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them. Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences). We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vikram S. Adve , John Mellor-Crummey , Mark Anderson , Jhy-Chun Wang , Daniel A. Reed , Ken Kennedy, An integrated compilation and performance analysis environment for data parallel programs, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.50-es, December 04-08, 1995, San Diego, California, United States
[doi> 10.1145/224170.224340]
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
 |
8
|
Tim Harris , Mark Plesko , Avraham Shinnar , David Tarditi, Optimizing memory transactions, Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, June 11-14, 2006, Ottawa, Ontario, Canada
|
| |
9
|
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, chapter 7. In http://download.intel.com/design/processor/manuals/253668.pdf, Sep 2008.
|
| |
10
|
|
 |
11
|
Seon Wook Kim , Chong-liang Ooi , Rudolf Eigenmann , Babak Falsafi , T. N. Vijaykumar, Reference idempotency analysis: a framework for optimizing speculative execution, Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, p.2-11, June 2001, Snowbird, Utah, United States
|
 |
12
|
|
| |
13
|
C. E. Oancea and A. Mycroft. Set-Congruence Dynamic Analysis for Software TLS. In Lang. Comp. Par. Comp. (LCPC), Aug 2008.
|
 |
14
|
|
| |
15
|
C. J. F. Pickett and C. Verbrugge. Software Thread Level Speculation for the Java Language and Virtual Machine Environment. In Lang. Comp. Par. Comp. (LCPC), Oct 2005.
|
| |
16
|
L. Rauchwerger, and N. M. Amato, and D. A. Padua. A Scalable Method for Run-Time Loop Parallelization. In Int. Conf. Supercomputing (ICS), Jul 1995.
|
| |
17
|
|
| |
18
|
P. Rundberg and P. Stenström. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. The Journal of Instr.-Level Par., 1999.
|
 |
19
|
|
| |
20
|
|
 |
21
|
Bratin Saha , Ali-Reza Adl-Tabatabai , Richard L. Hudson , Chi Cao Minh , Benjamin Hertzberg, McRT-STM: a high performance software transactional memory system for a multi-core runtime, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1123001]
|
 |
22
|
Susmit Sarkar , Peter Sewell , Francesco Zappa Nardelli , Scott Owens , Tom Ridge , Thomas Braibant , Magnus O. Myreen , Jade Alglave, The semantics of x86-CC multiprocessor machine code, Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, January 21-23, 2009, Savannah, GA, USA
|
 |
23
|
|
 |
24
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
25
|
M. Tremblay, J. Chan, S. Chaudhry, A. W. Conigliaro, and S. S. Tse The MAJC Architecture: A Synthesis of Parallelism and Scalability. In Symp. Microarch. (MICRO), Dec 2000.
|
 |
26
|
Adam Welc , Suresh Jagannathan , Antony Hosking, Safe futures for Java, Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
 |
27
|
|
| |
28
|
|
|