ACM Home Page
Please provide us with feedback. Feedback
Exploring parallelization strategies for NUFFT data translation
Full text PdfPdf (823 KB)
Source
International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the seventh ACM international conference on Embedded software table of contents
Grenoble, France
SESSION: Multicore, parallel implementations table of contents
Pages 187-196  
Year of Publication: 2009
ISBN:978-1-60558-627-4
Authors
Yuanrui Zhang  The Pennsylvania State University, State College, PA, USA
Mahmut Kandemir  The Pennsylvania State University, State College, PA, USA
Nikos P. Pitsianis  Aristotle University, Thessaloniki, Greece
Xiaobai Sun  Duke University, Durham, NC, USA
Sponsors
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
SIGDA: ACM Special Interest Group on Design Automation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 15,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629335.1629361
What is a DOI?

ABSTRACT

This paper introduces parallelization strategies for the Non-Uniform FFT (NUFFT) data translation on multicore architectures. The NUFFT enables the use of the celebrated FFT with un-equally spaced data in numerous situations in signal and image processing as well as in scientific computing. The critical extension lies at the translation of non-equally spaced or non-uniformly sampled data onto an equally spaced Cartesian grid or vice versa. The data translation can be made sufficiently accurate, with the arithmetic complexity linearly proportional to the size of the data ensemble. For large NUFFTs, however, the data translation is found substantially dominant in computation time on modern computers while it is expected to be dominated by the FFT. In order to match the FFT performance achieved by FFTW, data locality and parallelism in the data translation must be explored and exploited as well. We are concerned with two fundamental issues. First, the data translation can be described as a matrix-vector multiplication with a matrix of irregular sparsity. This is beyond the effective scope of the conventional tiling and parallelization schemes applied by a compiler for performance improvement on computation with dense matrices. Secondly, multicore processors exist and emerge in many different configurations, and are expected to evolve further in architectural variety. This may mean the end of performance tuning on a single type of architecture. In this paper, we introduce an automation tool that takes two specifications as input, one on an application-specific data translation algorithm, the other on a target multicore processor architecture. The tool generates a parallel code that explores the data locality and parallelism by utilizing both geometric structures in data translation and the processor-memory configurations in the target architecture. We present preliminary experimental results on both a simulator and a commercial multicore machine. The results show that our parallelization strategy brings significant performance improvement for the NUFFT data translation by efficiently exploiting the data locality and concurrency in the application.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Beylkin. On the fast Fourier transform of functions with singularities. Applied and Computational Harmonic Analysis, 2:363--381, October 1995.
 
2
G. Chen, L. Xue, J. Kim, K. Sobti, L. Deng, X. Sun, N. Pitsianis, C. Chakrabarti, M. Kandemir, and N. Vijaykhshnan. Geometric tiling for reducing power consumption in structured matrix operations. In Proceedings of IEEE International SOC Conference, pages 113--114, September 2006.
 
3
J. Chow, L. Lyon, and V. Sarkar. Automatic parallelization for symmetric shared-memory multiprocessors. In Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research, page 5, 1996.
 
4
J. Cooley and J. Tukey. An algorithm for the machine computation of complex Fourier series. Mathematics of Computation, 19:297--301, April 1965.
 
5
A. Duijndam and M. Schonewille. Nonunifrom fast Fourier transform. Geophysics, 64:539, April 1999.
 
6
A. Dutt and V. Rokhlin. Fast Fourier transforms for nonequispaced data. SIAM Journal on Scientific Computing, 14:1368--1393, November 1993.
 
7
A. Ernst. Simics hindsight: Reverse execution for software debugging. Virtual Strategy Magazine, May 2005.
 
8
J. A. Fessler and B. P. Sutton. Nonuniform fast Fourier transforms using min-max interpolation. IEEE Trans Sig Proc, 51(2):560--574, Feb. 2003.
 
9
R. Finkel and J. Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Informatica, 4(1):1--9, March 1974.
 
10
F. Franchetti, Y. Voronenko, and M. Puschel. FFT program generation for shared memory: SMP and multicore. In Proceedings of the ACM/IEEE Conference on Supercomputing, page 51, November 2006.
 
11
M. Frigo. A fast Fourier transform compiler. ACM SIGPLAN Notices, 34:169--180, May 1999.
 
12
M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, volume 3, pages 1381--1384, May 1998.
 
13
M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3:179--193, March 1992.
 
14
 
15
 
16
F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 319--329, 1988.
 
17
T. Knopp, S. Kunis, and D. Potts. A note on the iterative MRI reconstruction from nonuniform k-space data. International Journal of Biomedical Imaging, 2007, January 2007.
 
18
Q. Liu and N. Nguyen. An accurate algorithm for nonuniform fast Fourier transforms (NUFFTs). IEEE Microwaves and Guided Wave Letters, 8:18--20, January 1998.
 
19
Q. Liu and X. Tang. Iterative algorithm for nonuniform inverse fast Fourier transform (NU-IFFT). Electronics Letters, 34:1913--1914, October 1998.
 
20
E. Paalvast, H. Sips, and A. Gemund. Automatic parallel program generation and optimization form data decompositions. In Proceedings of International Conference on Parallel Processing, 1991.
 
21
M. PÄuschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. In Proceedings of the IEEE special issue on Program Generation, Optimization, and Adaptation, volume 93, pages 232--275, 2005.
 
22
L. Renganarayana and S. Rajopadhye. An approach to SAR imaging by means of non-uniform FFTs. In Proceedings of IEEE International Geoscience and Remote Sensing Symposium, volume 6, pages 4089--4091, July 2003.
 
23
L. Renganarayana and S. Rajopadhye. A geometric programming framework for optimal mutli-level tiling. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 18, May 2004.
 
24
T. Sorensen, T. Schaeffter, K. Noe, and M. Hansen. Accelerating the nonequispaced fast Fourier transform on commodity graphics hardware. IEEE Transactions on Medical Imaging, 27:538--547, April 2008.
 
25
M. Wolfe. More iteration space tiling. In Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 655--664, 1989.
 
26
M. Wolfe and M. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN 1991 conference on Programming Language Design and implementation, pages 30--44, 1991.
 
27
J. Xue. Loop Tiling for Parallelism, volume 575. The Springer International Series in Engineering and Computer Science, 2000.
 
28
S. Ying and J. Kuo. Application of two-dimensional nonuniform fast Fourier transform (2-d nufft) technique to analysis of shielded microstrip circuits. IEEE Transactions on Microwave Theory and Techniques, 53:993--999, March 2005.