|
ABSTRACT
The FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft "discovered" algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
ACT90
|
|
| |
ASU86
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
AV88
|
|
 |
BFJ+96
|
Robert D. Blumofe , Matteo Frigo , Christopher F. Joerg , Charles E. Leiserson , Keith H. Randall, An analysis of dag-consistent distributed shared-memory algorithms, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.297-308, June 24-26, 1996, Padua, Italy
[doi> 10.1145/237502.237574]
|
| |
CO75
|
R.E. Crochiere and A. V. Oppenheim. Analysis of linear digital networks. Proceedings of the IEEE, 63:581-595, April 1975.
|
| |
CT65
|
J.W. Cooley and J. W. 'Ihkey. An algorithm for the machine computation of the complex Fourier series. Mathematics of Computation, 19:297-301, April 1965.
|
| |
DV90
|
|
| |
FJ
|
Matteo Frigo and Steven G. Johnson. The FFTW web page. http://theory, lcs .air. edu/'fftw.
|
| |
FJ97
|
|
| |
FJ98
|
Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecture for the FFT, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1381- 1384, Seattle, WA, May 1998.
|
 |
FLR98
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
GHSJ96
|
|
 |
HK81
|
|
| |
HV92
|
P.H. Hartel and W. G. Vree. Arrays in a lazy functional language---a case study: the fast Fourier transform. In G. Hains and L. M. R. Mullin, editors, Arrays, functional languages, and parallel systems (ATABLE), pages 52-66, June 1992.
|
| |
JB83
|
H.W. Johnson and C. S. Bums. The design of optimal DFT algorithms using dynamic programming. IEEE Transactions on Acoustics, Speech and Signal Processing, 31:378-387, April 1983.
|
| |
Knu98
|
|
| |
Kul95
|
Joanna L. Kulik. Implementing compiler optimizations using parallel graph reduction. Master's thesis, Massachussets Institute of Technology, February 1995.
|
| |
Ler98
|
Xavier Leroy. The Objective Carol system release 2.00. Institut National de Recherche en Informatique at Automatique (INRIA), August 1998.
|
| |
Mar76
|
J.A. Maruhn. FOURGEN: a fast Fourier transform program generator. Computer Physics Communications, 12:147-162, 1976.
|
| |
Muc97
|
|
| |
OS89
|
|
| |
Par92
|
|
| |
PT87
|
F. Perez and T. Takaoka. A prime factor FF'T algorithm implementation using a program generation technique. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(8): 1221-1223, August 1987.
|
| |
Rad68
|
C.M. Rader. Discrete Fourier transforms when the number of data samples is prime. Proc. of the IEEE, 56:1107-1108, June 1968.
|
| |
SB96
|
i. Selesnick and C. S. Burrus. Automatic generation of prime length FFr programs. IEEE Transactions on Signal Processing, pages 14-24, January 1996.
|
| |
SJHB87
|
H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus. Real-valued fast Fourier transform algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35(6):849-863, June 1987.
|
| |
TAL97
|
Richard Tolimieri, Myoung An, and Chao Lu. Algorithms for Discrete Fourier Transform and Convolution. Springer Verlag, 1997.
|
| |
Vel95
|
|
| |
VS94a
|
J.S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2-3):110-147, 1994. double special issue on Large- Scale Memories.
|
| |
VS94b
|
J.S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory II: Hierarchical multilevel memories. Algorithrnica, 12(2-3):148-169, 1994. double special issue on Large-Scale Memories.
|
 |
Wad97
|
|
| |
Win78
|
S. Winograd. On computing the discrete Fourier transform. Mathematics of Computation, 32(1):175-199, January 1978.
|
CITED BY 41
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Markus Püschel , José M. F. Moura , Bryan Singer , Jianxin Xiong , Jeremy Johnson , David Padua , Manuela Veloso , Robert W. Johnson, Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms, International Journal of High Performance Computing Applications, v.18 n.1, p.21-45, February 2004
|
|
|
L. Almagor , Keith D. Cooper , Alexander Grosul , Timothy J. Harvey , Steven W. Reeves , Devika Subramanian , Linda Torczon , Todd Waterman, Finding effective compilation sequences, ACM SIGPLAN Notices, v.39 n.7, July 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Timothy J. Knight , Ji Young Park , Manman Ren , Mike Houston , Mattan Erez , Kayvon Fatahalian , Alex Aiken , William J. Dally , Pat Hanrahan, Compilation for explicitly managed memory hierarchies, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Memory---Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mike Houston , Ji-Young Park , Manman Ren , Timothy Knight , Kayvon Fatahalian , Alex Aiken , William Dally , Pat Hanrahan, A portable runtime interface for multi-level memory hierarchies, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Melina Demertzi , Pedro C. Diniz , Mary W. Hall , Anna C. Gilbert , Yi Wang, Computation reuse in domain-specific optimization of signal recognition, Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays, February 22-24, 2009, Monterey, California, USA
|
|
|
|
|
|
Hans Zima , Mary Hall , Chun Chen , Jaqueline Chame, Model-guided autotuning of high-productivity languages for petascale computing, Proceedings of the 18th ACM international symposium on High performance distributed computing, p.151-166, June 11-13, 2009, Garching, Germany
|
|
|
|
|
|
|
|