| Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures |
| Full text |
Pdf
(465 KB)
|
Source
|
ACM Symposium on Parallel Algorithms and Architectures
archive
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
table of contents
Calgary, AB, Canada
SESSION: Concurrency mechanisms
table of contents
Pages 91-100
Year of Publication: 2009
ISBN:978-1-60558-606-9
|
|
Authors
|
|
Daniel Spoonhower
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Guy E. Blelloch
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Phillip B. Gibbons
|
Intel Research Pittsburgh, Pittsburgh, PA, USA
|
|
Robert Harper
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 50, Citation Count: 0
|
|
|
ABSTRACT
Work stealing is a popular method of scheduling fine-grained parallel tasks. The performance of work stealing has been extensively studied, both theoretically and empirically, but primarily for the restricted class of nested-parallel (or fully strict) computations. We extend this prior work by considering a broader class of programs that also supports pipelined parallelism through the use of parallel futures. Though the overhead of work-stealing schedulers is often quantified in terms of the number of steals, we show that a broader metric, the number of deviations, is a better way to quantify work-stealing overhead for less restrictive forms of parallelism, including parallel futures. For such parallelism, we prove bounds on work-stealing overheads--scheduler time and cache misses--as a function of the number of deviations. Deviations can occur, for example, when work is stolen or when a future is touched. We also show instances where deviations can occur independently of steals and touches. Next, we prove that, under work stealing, the expected number of deviations is O(Pd + td) in a P-processor execution of a computation with span d and t touches of futures. Moreover, this bound is existentially tight for any work-stealing scheduler that is parsimonious (those where processors steal only when their queues are empty); this class includes all prior work-stealing schedulers. We also present empirical measurements of the number of deviations incurred by a classic application of futures, Halstead's quicksort, using our parallel implementation of ML. Finally, we identify a family of applications that use futures and, in contrast to quicksort, incur significantly smaller overheads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. The data locality of work stealing. Theory of Comput. Syst., 35 (3): 321--347, 2002.
|
 |
2
|
Shivali Agarwal , Rajkishore Barik , Dan Bonachea , Vivek Sarkar , Rudrapatna K. Shyamasundar , Katherine Yelick, Deadlock-free scheduling of X10 computations with bounded resources, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, June 09-11, 2007, San Diego, California, USA
[doi> 10.1145/1248377.1248416]
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
 |
6
|
Guy E. Blelloch , Phillip B. Gibbons , Girija J. Narlikar , Yossi Matias, Space-efficient scheduling of parallelism with synchronization variables, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.12-23, June 23-25, 1997, Newport, Rhode Island, United States
[doi> 10.1145/258492.258494]
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
 |
11
|
Robert D. Blumofe , Matteo Frigo , Christopher F. Joerg , Charles E. Leiserson , Keith H. Randall, An analysis of dag-consistent distributed shared-memory algorithms, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.297-308, June 24-26, 1996, Padua, Italy
[doi> 10.1145/237502.237574]
|
 |
12
|
|
| |
13
|
Guojing Cong , Sreedhar Kodali , Sriram Krishnamoorthy , Doug Lea , Vijay Saraswat , Tong Wen, Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing, Proceedings of the 2008 37th International Conference on Parallel Processing, p.536-545, September 09-11, 2008
[doi> 10.1109/ICPP.2008.88]
|
| |
14
|
|
 |
15
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
16
|
|
 |
17
|
Michael I. Gordon , William Thies , Saman Amarasinghe, Exploiting coarse-grained task, data, and pipeline parallelism in stream programs, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
 |
18
|
|
 |
19
|
|
| |
20
|
Intel Corporation. Intel Threading Building Blocks. www.threadingbuildingblocks.org, 2009.
|
 |
21
|
D. A. Kranz , R. H. Halstead, Jr. , E. Mohr, Mul-T: a high-performance parallel Lisp, Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, p.81-90, June 19-23, 1989, Portland, Oregon, United States
|
 |
22
|
|
| |
23
|
|
 |
24
|
Eric Mohr , David A. Kranz , Robert H. Halstead, Jr., Lazy task creation: a technique for increasing the granularity of parallel programs, Proceedings of the 1990 ACM conference on LISP and functional programming, p.185-197, June 27-29, 1990, Nice, France
[doi> 10.1145/91556.91631]
|
 |
25
|
Daniel Spoonhower , Guy E. Blelloch , Robert Harper , Phillip B. Gibbons, Space profiling for parallel functional programs, Proceeding of the 13th ACM SIGPLAN international conference on Functional programming, September 20-28, 2008, Victoria, BC, Canada
|
 |
26
|
|
|