ACM Home Page
Please provide us with feedback. Feedback
Idempotent work stealing
Full text PdfPdf (381 KB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming table of contents
Raleigh, NC, USA
SESSION: Task mapping and scheduling table of contents
Pages 45-54  
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
Authors
Maged M. Michael  IBM Thomas J. Watson Research Center, Yorktown Height, NY, USA
Martin T. Vechev  IBM Thomas J. Watson Research Center, Hawthorne, NY, USA
Vijay A. Saraswat  IBM Thomas J. Watson Research Center, Hawthorne, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 53,   Downloads (12 Months): 300,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1504176.1504186
What is a DOI?

ABSTRACT

Load balancing is a technique which allows efficient parallelization of irregular workloads, and a key component of many applications and parallelizing runtimes. Work-stealing is a popular technique for implementing load balancing, where each parallel thread maintains its own work set of items and occasionally steals items from the sets of other threads.

The conventional semantics of work stealing guarantee that each inserted task is eventually extracted exactly once. However, correctness of a wide class of applications allows for relaxed semantics, because either: i) the application already explicitly checks that no work is repeated or ii) the application can tolerate repeated work.

In this paper, we introduce idempotent work tealing, and present several new algorithms that exploit the relaxed semantics to deliver better performance. The semantics of the new algorithms guarantee that each inserted task is eventually extracted at least once-instead of exactly once.

On mainstream processors, algorithms for conventional work stealing require special atomic instructions or store-load memory ordering fence instructions in the owner's critical path operations. In general, these instructions are substantially slower than regular memory access instructions. By exploiting the relaxed semantics, our algorithms avoid these instructions in the owner's operations.

We evaluated our algorithms using common graph problems and micro-benchmarks and compared them to well-known conventional work stealing algorithms, the THE Cilk and Chase-Lev algorithms. We found that our best algorithm (with LIFO extraction) outperforms existing algorithms in nearly all cases, and often by significant margins.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
5
6
 
7
8
9
 
10
11
 
12
IBM System/370 Extended Architecture, Principles of Operation, 1983. Publication No. SA22-7085.
 
13
Doug Lea. The JSR-133 Cookbook for Compiler Writers. Web page.
 
14
Maged M. Michael. Practical lock-free and wait-free LL/SC/VL implementations using 64-bit CAS. In Proceedings of the Eighteenth International Conference on Distributed Computing, DISC, pages 144--158, October 2004.


Collaborative Colleagues:
Maged M. Michael: colleagues
Martin T. Vechev: colleagues
Vijay A. Saraswat: colleagues