ACM Home Page
Please provide us with feedback. Feedback
A lightweight in-place implementation for software thread-level speculation
Full text PdfPdf (420 KB)
Source
ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures table of contents
Calgary, AB, Canada
SESSION: Transactional memory II table of contents
Pages 223-232  
Year of Publication: 2009
ISBN:978-1-60558-606-9
Authors
Cosmin E. Oancea  The University of Cambridge, Cambridge, United Kingdom
Alan Mycroft  The University of Cambridge, Cambridge, United Kingdom
Tim Harris  Microsoft Research, Cambridge, United Kingdom
Sponsors
SIGOPS: ACM Special Interest Group on Operating Systems
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 73,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1583991.1584050
What is a DOI?

ABSTRACT

Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.

Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).

We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
 
6
7
8
 
9
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, chapter 7. In http://download.intel.com/design/processor/manuals/253668.pdf, Sep 2008.
 
10
11
12
 
13
C. E. Oancea and A. Mycroft. Set-Congruence Dynamic Analysis for Software TLS. In Lang. Comp. Par. Comp. (LCPC), Aug 2008.
14
 
15
C. J. F. Pickett and C. Verbrugge. Software Thread Level Speculation for the Java Language and Virtual Machine Environment. In Lang. Comp. Par. Comp. (LCPC), Oct 2005.
 
16
L. Rauchwerger, and N. M. Amato, and D. A. Padua. A Scalable Method for Run-Time Loop Parallelization. In Int. Conf. Supercomputing (ICS), Jul 1995.
 
17
 
18
P. Rundberg and P. Stenström. An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors. The Journal of Instr.-Level Par., 1999.
19
 
20
21
22
23
24
 
25
M. Tremblay, J. Chan, S. Chaudhry, A. W. Conigliaro, and S. S. Tse The MAJC Architecture: A Synthesis of Parallelism and Scalability. In Symp. Microarch. (MICRO), Dec 2000.
26
27
 
28

Collaborative Colleagues:
Cosmin E. Oancea: colleagues
Alan Mycroft: colleagues
Tim Harris: colleagues