ACM Home Page
Please provide us with feedback. Feedback
Computer generation of fast fourier transforms for the cell broadband engine
Full text PdfPdf (420 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
SESSION: Applications of the cell processor table of contents
Pages 26-35  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Srinivas Chellappa  Carnegie Mellon University, Pittsburgh, PA, USA
Franz Franchetti  Carnegie Mellon University, Pittsburgh, PA, USA
Markus Püeschel  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 71,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542285
What is a DOI?

ABSTRACT

The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has an impressive floating point peak performance, programming and optimizing for it is difficult as it requires explicit memory management, multithreading, streaming, and vectorization. We address this problem for the discrete Fourier transform (DFT) by extending Spiral, a program generation system, to automatically generate highly optimized implementations for the Cell. The extensions include multi-SPE parallelization and explicit memory streaming, both performed at a high abstraction level using rewriting systems operating on Spiral's internal domain-specific language. Further, we support latency and throughput optimizations, single and double precision, and different data formats. The performance of Spiral's computer generated code is comparable with and sometimes better than existing DFT implementations, where available.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. A. Bader and V. Agarwal. FFTC: Fastest Fourier transform for the IBM Cell Broadband Engine. In IEEE Intl. Conference on High Performance Computing, pages 172--184, 2007.
 
2
S. Chellappa, F. Franchetti, and M. Püschel. FFT program generation for the Cell BE. In International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA), 2008.
 
3
A. C. Chow. Fast Fourier transform SIMD code generators for synergistic processor element of Cell processor. In Workshop on Cell Systems and Applications, 2008.
 
4
A. C. Chow, G. C. Fossum, and D. A. Brokenshire. A programming example: Large FFT on the Cell Broadband Engine. Technical report, IBM, May 2005.
 
5
L. Cico, R. Cooper, and J. Greene. Performance and Programmability of the IBM/Sony/Toshiba Cell Broadband Engine Processor. In Proc. of (EDGE) Workshop, 2006.
 
6
FFTW on the cell processor. http://www.fftw.org/cell/.
 
7
F. Franchetti, D. McFarlin, F. de Mesmay, H. Shen, T. W. Wlodarczyk, S. Chellappa, M. Telgarsky, P. A. Milder, Y. Voronenko, Q. Yu, J. C. Hoe, J. M. F. Moura, and M. Pueschel. Program generation with Spiral: Beyond transforms. In High Performance Embedded Computing (HPEC), 2008.
8
9
 
10
F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006.
 
11
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proc. of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):216--231, 2005.
 
12
J. Greene and R. Cooper. A parallel 64K complex FFT algorithm for the ibm/sony/toshiba cell broadband engine processor. In Global Signal Processing Expo (GSPx), 2005.
 
13
IBM. Fast Fourier Transform Library Programmer's Guide and API Reference, 3.0 edition, 2008.
 
14
 
15
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation," 93(2):232--275, 2005.
 
16
 
17

Collaborative Colleagues:
Srinivas Chellappa: colleagues
Franz Franchetti: colleagues
Markus Püeschel: colleagues