|
ABSTRACT
Resource management on accelerator based systems is complicated by the disjoint nature of the main CPU and accelerator, which involves separate memory hierarhcies, different degrees of parallelism, and relatively high cost of communicating between them. For applications with irregul parallelism, where work is dynamically created based on other computations, the accelerators may both consume and produce work. To maintain load balance, the accelerators hand work back to the CPU to be scheduled. In this paper we consider multiple approaches for such scheduling problems and use the Cell BE system to demonstrate the different schedulers and the trade-offs between them. Our evaluation is done with both microbenchmarks and two bioinformatics applications (PBPI and RAxML). Our baseline approach uses a standard Linux scheduler on the CPU, possibly with more than one process per CPU. We then consider the addition of cooperative scheduling to the Linux kernel and a user-level work-stealing approach. The two cooperative approaches are able to decrease SPE idle time, by 30% and 70%, respectively, relative to the baseline scheduler. In both cases we believe the changes required to application level codes, e.g., a program written with MPI processes that use accelerator based compute nodes, is reasonable, although the kernel level approach provides more generality and ease of implementation, but often less performance than work stealing approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Charm++ on the Cell Procesor. Available at http://charm.cs.uiuc.edu/research/cell/.
|
| |
2
|
IBM Accelerated Library Framework for Cell Programmer's Guide and API Reference.
|
| |
3
|
D.A. Bader, V. Agarwal, and K. Madduri. On the design and analysis of irregular algorithms on the cell processor: A case study of list ranking. In Proc. of the 21st International Parallel and Distributed Processing Symposium, pages 1--10, 2007.
|
 |
4
|
|
| |
5
|
|
| |
6
|
F. Blagojevic, X. Feng, K. Cameron, and D.S. Nikolopoulos. Modeling Multi-grain Parallelism on Heterogeneous Multi-core Processors: A Case Study with the Cell BE. In Proc. of the 2008 HiPEAC Conference, Jan. 2008.
|
 |
7
|
Filip Blagojevic , Dimitris S. Nikolopoulos , Alexandros Stamatakis , Christos D. Antonopoulos, Dynamic multigrain parallelization on the cell broadband engine, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
[doi> 10.1145/1229428.1229445]
|
| |
8
|
F. Blagojevic, A. Stamatakis, C. Antonopoulos, and D. Nikolopoulos. Raxml-cell: Parallel phylogenetic tree inference on the cell broadband engine. In Proc. of the 21st International Parallel and Distributed Processing Symposium, Long Beach, CA, Mar. 2007.
|
 |
9
|
|
| |
10
|
M. Charalambous, P. Trancoso, and A. Stamatakis. Initial Experiences Porting a Bioinformatics Application to a Graphics Processor. In Panhellenic Conference on Informatics, pages 415--425, 2005.
|
 |
11
|
|
| |
12
|
M. de Krujif and K. Sankaralingam. MapReduce for the Cell B.E. Architecture.
|
| |
13
|
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. pages 137--150.
|
| |
14
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
 |
15
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188543]
|
 |
16
|
Naga K. Govindaraju , Brandon Lloyd , Wei Wang , Ming Lin , Dinesh Manocha, Fast computation of database operations using graphics processors, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007594]
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
M. Monteyne. RapidMind Multi-core Development Platform. Available from http://www.rapidmind.net/case-rapidmind.php.
|
| |
21
|
C. Mueller, B. Martin, and A. Lumsdaine. CorePy: High-Productivity Cell/B.E. Programming. In Proc. of the First STI/Georgia Tech Workshop on Software and Applications for the Cell/B.E. Processor, June 2007.
|
| |
22
|
F. Petrini, G. Fossum, M. Kistler, and M. Perrone. Multicore Suprises: Lesson Learned from Optimizing Sweep3D on the Cell Broadbend Engine.
|
| |
23
|
F. Petrini, D. Scarpazza, O. Villa, and J. Fernandez. Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors. In Proc. of the 21st International Parallel and Distributed Processing Symposium, Long Beach, CA, Mar. 2007.
|
| |
24
|
D.P. Scarpazza, O. Villa, and F. Petrini. Peak-Performance DFA-based String Matching on the Cell Processor. In IPDPS, pages 1--8. IEEE, 2007.
|
 |
25
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, The potential of the cell processor for scientific computing, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
[doi> 10.1145/1128022.1128027]
|
| |
26
|
Y. Zhao and K. Kennedy. Dependence-based Code Generation for a Cell Processor. In Proc. of the 19th International Workshop on Languages and Compilers for Parallel Computing, New Orleans, LA, Nov. 2006.
|
|