|
ABSTRACT
The FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft "discovered" algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Bilmes , K. Asanovic , Jim Demmel , D. L %A , C. Chin, Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology, University of Tennessee, Knoxville, TN, 1996
|
 |
2
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
[doi> 10.1145/263580.263662]
|
| |
3
|
Franz Franchetti, Herbert Karner, Stefan Kral, and C. W. Ueberhuber. Architecture independent short vector FFTs. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 2, pages 1109--1112, 2001.
|
 |
4
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
5
|
|
| |
6
|
Nathaniel A. Kushman. Performance nonmonotonicities: A case study of the UltraSPARC processor. Master's thesis, MIT Department of Electrical Engineering and Computer Science, June 1998.
|
| |
7
|
|
| |
8
|
|
 |
9
|
Jianxin Xiong , Jeremy Johnson , Robert Johnson , David Padua, SPL: a language and compiler for DSP algorithms, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.298-308, June 2001, Snowbird, Utah, United States
|
| |
10
|
|
| |
11
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
12
|
|
 |
13
|
Robert D. Blumofe , Matteo Frigo , Christopher F. Joerg , Charles E. Leiserson , Keith H. Randall, An analysis of dag-consistent distributed shared-memory algorithms, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.297-308, June 24-26, 1996, Padua, Italy
[doi> 10.1145/237502.237574]
|
| |
14
|
{CO75} R. E. Crochiere and A. V. Oppenheim. Analysis of linear digital networks. Proceedings of the IEEE, 63:581--595, April 1975.
|
| |
15
|
{CT65} J. W. Cooley and J. W. Tukey. An algorithm for the machine computation of the complex Fourier series. Mathematics of Computation, 19:297--301, April 1965.
|
| |
16
|
|
| |
17
|
{FJ} Matteo Frigo and Steven G. Johnson. The FFTW web page. http://theory.lcs.mit.edu/~fftw.
|
| |
18
|
|
| |
19
|
{FJ98} Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1381--1384, Seattle, WA, May 1998.
|
 |
20
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
21
|
|
 |
22
|
|
| |
23
|
{HV92} P. H. Hartel and W. G. Vree. Arrays in a lazy functional language---a case study: the fast Fourier transform. In G. Hains and L. M. R. Mullin, editors, Arrays, functional languages, and parallel systems (ATABLE), pages 52--66, June 1992.
|
| |
24
|
{JB83} H. W. Johnson and C. S. Burrus. The design of optimal DFT algorithms using dynamic programming. IEEE Transactions on Acoustics, Speech and Signal Processing, 31:378--387, April 1983.
|
| |
25
|
{Knu98} Donald E. Knuth. The Art of Computer Programming, volume 2 (Seminumerical Algorithms). Addison-Wesley, 3rd edition, 1998.
|
| |
26
|
{Kul95} Joanna L. Kulik. Implementing compiler optimizations using parallel graph reduction. Master's thesis, Massachussets Institute of Technology, February 1995.
|
| |
27
|
{Ler98} Xavier Leroy. The Objective Caml system release 2.00. Institut National de Recherche en Informatique at Automatique (INRIA), August 1998.
|
| |
28
|
{Mar76} J. A. Maruhn. FOURGEN: a fast Fourier transform program generator. Computer Physics Communications, 12:147--162, 1976.
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
{PT87} F. Perez and T. Takaoka. A prime factor FFT algorithm implementation using a program generation technique. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(8):1221--1223, August 1987.
|
| |
33
|
{Rad68} C. M. Rader. Discrete Fourier transforms when the number of data samples is prime. Proc. of the IEEE, 56:1107--1108, June 1968.
|
| |
34
|
{SB96} I. Selesnick and C. S. Burrus. Automatic generation of prime length FFT programs. IEEE Transactions on Signal Processing, pages 14--24, January 1996.
|
| |
35
|
{SJHB87} H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus. Real-valued fast Fourier transform algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35(6):849--863, June 1987.
|
| |
36
|
{TAL97} Richard Tolimieri, Myoung An, and Chao Lu. Algorithms for Discrete Fourier Transform and Convolution. Springer Verlag, 1997.
|
| |
37
|
|
| |
38
|
{VS94a} J. S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2--3):110--147, 1994. double special issue on Large-Scale Memories.
|
| |
39
|
{VS94b} J. S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory II: Hierarchical multilevel memories. Algorithmica, 12(2--3):148--169, 1994. double special issue on Large-Scale Memories.
|
 |
40
|
|
| |
41
|
{Win78} S. Winograd. On computing the discrete Fourier transform. Mathematics of Computation, 32(1):175--199, January 1978.
|
|