| Computer generation of fast fourier transforms for the cell broadband engine |
| Full text |
Pdf
(420 KB)
|
Source
|
International Conference on Supercomputing
archive
Proceedings of the 23rd international conference on Supercomputing
table of contents
Yorktown Heights, NY, USA
SESSION: Applications of the cell processor
table of contents
Pages 26-35
Year of Publication: 2009
ISBN:978-1-60558-498-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 28, Downloads (12 Months): 71, Citation Count: 0
|
|
|
ABSTRACT
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has an impressive floating point peak performance, programming and optimizing for it is difficult as it requires explicit memory management, multithreading, streaming, and vectorization. We address this problem for the discrete Fourier transform (DFT) by extending Spiral, a program generation system, to automatically generate highly optimized implementations for the Cell. The extensions include multi-SPE parallelization and explicit memory streaming, both performed at a high abstraction level using rewriting systems operating on Spiral's internal domain-specific language. Further, we support latency and throughput optimizations, single and double precision, and different data formats. The performance of Spiral's computer generated code is comparable with and sometimes better than existing DFT implementations, where available.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. A. Bader and V. Agarwal. FFTC: Fastest Fourier transform for the IBM Cell Broadband Engine. In IEEE Intl. Conference on High Performance Computing, pages 172--184, 2007.
|
| |
2
|
S. Chellappa, F. Franchetti, and M. Püschel. FFT program generation for the Cell BE. In International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA), 2008.
|
| |
3
|
A. C. Chow. Fast Fourier transform SIMD code generators for synergistic processor element of Cell processor. In Workshop on Cell Systems and Applications, 2008.
|
| |
4
|
A. C. Chow, G. C. Fossum, and D. A. Brokenshire. A programming example: Large FFT on the Cell Broadband Engine. Technical report, IBM, May 2005.
|
| |
5
|
L. Cico, R. Cooper, and J. Greene. Performance and Programmability of the IBM/Sony/Toshiba Cell Broadband Engine Processor. In Proc. of (EDGE) Workshop, 2006.
|
| |
6
|
FFTW on the cell processor. http://www.fftw.org/cell/.
|
| |
7
|
F. Franchetti, D. McFarlin, F. de Mesmay, H. Shen, T. W. Wlodarczyk, S. Chellappa, M. Telgarsky, P. A. Milder, Y. Voronenko, Q. Yu, J. C. Hoe, J. M. F. Moura, and M. Pueschel. Program generation with Spiral: Beyond transforms. In High Performance Embedded Computing (HPEC), 2008.
|
 |
8
|
|
 |
9
|
|
| |
10
|
F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006.
|
| |
11
|
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proc. of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):216--231, 2005.
|
| |
12
|
J. Greene and R. Cooper. A parallel 64K complex FFT algorithm for the ibm/sony/toshiba cell broadband engine processor. In Global Signal Processing Expo (GSPx), 2005.
|
| |
13
|
IBM. Fast Fourier Transform Library Programmer's Guide and API Reference, 3.0 edition, 2008.
|
| |
14
|
|
| |
15
|
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation," 93(2):232--275, 2005.
|
| |
16
|
|
| |
17
|
|
INDEX TERMS
Primary Classification:
I.
Computing Methodologies
I.1
SYMBOLIC AND ALGEBRAIC MANIPULATION
I.1.2
Algorithms
Subjects:
Algebraic algorithms
Additional Classification:
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Code generation;
Optimization;
Compilers
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.1
Numerical Algorithms and Problems
General Terms:
Algorithms,
Design,
Performance,
Theory
Keywords:
automatic performance tuning,
cell be,
dft,
fast fourier transform,
multibuffering,
multicore,
parallelization,
performance library,
program generation,
streaming
|