ACM Home Page
Please provide us with feedback. Feedback
Scheduling FFT computation on SMP and multicore systems
Full text PdfPdf (461 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 21st annual international conference on Supercomputing table of contents
Seattle, Washington
SESSION: Application optimization table of contents
Pages: 293 - 301  
Year of Publication: 2007
ISBN:978-1-59593-768-1
Authors
Ayaz Ali  University of Houston, Houston, TX
Lennart Johnsson  University of Houston, Houston, TX
Jaspal Subhlok  University of Houston, Houston, TX
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 198,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1274971.1275011
What is a DOI?

ABSTRACT

Increased complexity of memory systems to ameliorate the gap between the speed of processors and memory has made it increasingly harder for compilers to optimize an arbitrary code within a palatable amount of time. With the emergence of multicore (CMP), multiprocessor (SMP) and hybrid shared memory multiprocessor architectures, achieving high e ciency is becoming even more challenging. To address the challenge to achieve high e ciency in performance critical applications, domain speci c frameworks have been developed that aid the compilers in scheduling the computations. We have developed a portable framework for the Fast Fourier Transform (FFT) that achieves high e ciency by automatically adapting to various architectural features. Adapting to parallel architectures by searching through all the combinations of schedules (plans) is an expensive task, even when the search is conducted in parallel. In this paper, we develop heuristics to simplify the generation of better schedules for parallel FFT computations on CMP/SMP systems. We evaluate the performance of OpenMP and PThreads implementations of FFT on a number of latest architectures. The performance of parallel FFT schedules is compared with that of the best plan generated for sequential FFT and the speedup for di erent number of processors is reported. In the end, we also present a performance comparison between the UHFFT and FFTW implementations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Ali, A., Johnsson, L., and Mirkovic, D. Empirical Auto-tuning Code Generator for FFT and Trignometric Transforms. In ODES: 5th Workshop on Optimizations for DSP and Embedded Systems, in conjunction with International Symposium on Code Generation and Optimization (CGO) (San Jose, CA, March 2007).
 
3
 
4
Cooley, J., and Tukey, J. An algorithm for the machine computation of complex fourier series. Mathematics of Computation 19 (1965), 297--301.
5
6
 
7
Frigo, M., and Johnson, S. G. The design and implementation of FFTW3. Proceedings of the IEEE 93, 2 (2005), 216--231. special issue on "Program Generation, Optimization, and Platform Adaptation".
 
8
 
9
10
 
11
 
12
Puschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B. W., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation" 93, 2 (2005), 232--275.
13
 
14
Temperton, C. Self-Sorting Mixed-Radix Fast Fourier Transforms. Journal of Computational Physics 52 (1983), 1--23.
 
15
Tolimieri, R., An, M., and Lu, C. Algorithms for discrete fourier transform and convolution. Springer-Verlag, 1997.


Collaborative Colleagues:
Ayaz Ali: colleagues
Lennart Johnsson: colleagues
Jaspal Subhlok: colleagues