| Load balancing using work-stealing for pipeline parallelism in emerging applications |
| Full text |
Pdf
(456 KB)
|
Source
|
International Conference on Supercomputing
archive
Proceedings of the 23rd international conference on Supercomputing
table of contents
Yorktown Heights, NY, USA
POSTER SESSION: Posters
table of contents
Pages 517-518
Year of Publication: 2009
ISBN:978-1-60558-498-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 27, Downloads (12 Months): 61, Citation Count: 0
|
|
|
ABSTRACT
Parallel programming is a requirement in the multi-core era. One of the most promising techniques to make parallel programming available for general users is the use of parallel programming patterns. Functional pipeline parallelism is a well suited pattern for many emerging applications, such as streaming and "Recognition, Mining and Synthesis" (RMS) workloads. In this paper we develop an analytical model for pipeline parallelism and use it to characterize and optimize two of the PARSEC benchmarks which use the parallel pipeline pattern, ferret and dedup. We identify two scalability limitations: load imbalance and I/O bottlenecks. We address load imbalance using two techniques: parallel pipeline stage collapsing and dynamic scheduling. We implemented these optimizations using Pthreads and the Threading Building Blocks (TBB) libraries. We compare predicted and measured performance of all these implementations on a large scale SMP machine and we note that the work-stealing TBB implementation outperforms all other variants.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Christian Bienia , Sanjeev Kumar , Jaswinder Pal Singh , Kai Li, The PARSEC benchmark suite: characterization and architectural implications, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454128]
|
 |
2
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
 |
3
|
|
| |
4
|
G. Contreras and M. Martonosi. Characterizing and improving the performance of intel threading building blocks. In Workload Characterization, 2008. IISWC 2008. IEEE International Symposium on, pages 57--66, Sept. 2008.
|
| |
5
|
A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. Load Balancing using Work-Stealing for Pipeline Parallelism in Emerging Applications. Technical report, Dept. of Computer Architecture. Univ. of Malaga, 2009. http://www.ac.uma.es/ asenjo/research/.
|
| |
6
|
|
|