|
ABSTRACT
The Cell Broadband Engine (CBE) is a heterogeneous multi-core processor with unique design properties for high-performance computing. It consists of one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPEs) connected with the Elements Interconnect Network (EIB). It employs novel techniques, such as software managed cache, to hide memory latency and guarantee, by default, maximum utilization for the overall system resources. However, utilization of these facilities requires complex designs and implementations of algorithms to get best performance. In this paper we discuss our micro-threading model realized by a nano-kernel implemented on top of each SPE. SPE's Nano-kernel, or SPENK, employs the micro-threading model to increase the utilization of the CBE resources while simplifying the programming model. Our framework boosted processor's overall performance by a factor of five compared to the current threading model. It allowed us to build a distributed model for the SPEs' tasks management and automated Local Storage (LS) management. We further utilized the micro-threading model to build an event based programming model on top of the CBE architecture. We tested our framework on two types of algorithms: (1) Uniform memory access algorithms, such as parallel summation, and (2) Non-uniform or irregular memory access algorithms, specifically tree spanning algorithms. For the first type of algorithms we could obtain up to three times performance improvement and fivefold performance improvement in the second type of algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Accelerated Library Framework for Cell Broadband Engine Programmer's Guide and API Reference, IBM Corporation, version 1.1, October 2007.
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
Cell Broadband Engine Programming Handbook, IBM, version 1.1, April 2007.
|
| |
6
|
Cell Broadband Engine Programming Tutorial, IBM, version 3.0, October 2007.
|
| |
7
|
|
| |
8
|
David A. Bader, V. Agarwal, and K. Madduri. On the Design and Analysis of Irregular Algorithms on the Cell Processor: A case study on list ranking. 21th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, CA, March 26--30, 2007.
|
| |
9
|
David Kunzman, Gengbin Zheng, Eric Bohm, Laxmikant V. Kale. Charm++, Offload API, and the Cell Processor. In PMUP Workshop at PACT'06, September 2006.
|
| |
10
|
David Kunzman. Charm++ on the Cell Processor. Master's Thesis, Department of Computer Science, University of Illinois 2006
|
| |
11
|
Dongarra J., Gannon D., Fox G., and Kennedy K. The impact of Multicore on Computational Science Software. CTWatch Quarterly. Vol. 3, No. 1. February 2007.
|
 |
12
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188543]
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Krste Asanovic, et. al. The Landscape of Parallel Computing Research - A view from Berkely. EECS Department, University of California, Berkeley. Technical Report No. UCB/EECS-2006-183. December 18, 2006.
|
 |
17
|
Arun Kumar , Naresh Jayam , Ashok Srinivasan , Ganapathy Senthilkumar , Pallav K. Baruah , Shakti Kapoor , Murali Krishna , Raghunath Sarma, Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, June 09-11, 2007, San Diego, California, USA
[doi> 10.1145/1248377.1248387]
|
 |
18
|
David M. Kunzman , Gengbin Zheng , Eric Bohm , James C. Phillips , Laxmikant V. Kale, Charm++ simplifies coding for the cell processor, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188596]
|
| |
19
|
M. K. Velamati, Arun Kumar, Naresh Jayam, Ganapathy Senthilkumar, Pallav K. Baruah, Raghunath Sharma, Shakti Kapoor, Ashok Srinivasan: Optimization of Collective Communication in Intra-cell MPI. HiPC 2007: 488--499
|
| |
20
|
Arun Kumar , Ganapathy Senthilkumar , Murali Krishna , Naresh Jayam , Pallav K. Baruah , Raghunath Sharma , Ashok Srinivasan , Shakti Kapoor, A Buffered-Mode MPI Implementation for the Cell BETM Processor, Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007, p.603-610, May 27-30, 2007, Beijing, China
[doi> 10.1007/978-3-540-72584-8_80]
|
| |
21
|
|
| |
22
|
Manferdelli J. The Many-Core Inflection Point for Mass Market Computer Systems. CTWatch Quarterly. Vol. 3, No. 1. February 2007.
|
| |
23
|
McCaplin J., Moore C., and Hester P. The role of Multicore Processors in the Evolution of General-Purpose Computing. CTWatch Quarterly. Vol. 3, No. 1. February 2007.
|
|