ACM Home Page
Please provide us with feedback. Feedback
Automatic SIMD vectorization of chains of recurrences
Full text PdfPdf (272 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 22nd annual international conference on Supercomputing table of contents
Island of Kos, Greece
SESSION: Code performance tuning table of contents
Pages 245-255  
Year of Publication: 2008
ISBN:978-1-60558-158-3
Authors
Yixin Shou  Florida State University, Tallahassee, FL, USA
Robert A. van Engelen  Florida State University, Tallahassee, FL, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 73,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1375527.1375564
What is a DOI?

ABSTRACT

Many computational tasks require repeated evaluation of functions over structured grids, such as plotting in a coordinate system, rendering of parametric objects in 2D and 3D, numerical grid generation, and signal processing. In this paper, we present a method and toolset to speed up closed-form function evaluations over grids by vectorizing Chains of Recurrences (CR). CR forms of closed-form functions require fewer operations to evaluate per grid point. However, the present CR formalism makes CR forms inherently non-vectorizable due to the dependences carried from one point to the next. To address this limitation, we developed a new decoupling method for the CR algebra to translate math functions into Vector Chains of Recurrences (VCR) forms. The VCR coefficients are packed in short vector registers for efficient execution. Performance results of benchmark functions evaluated in single and double precision VCR forms are compared to the Intel compiler's auto-vectorized code and the high-performance small vector math library (SVML). The results show a significant performance increase of our VCR method over SVML and scalar CRs, from doubling the execution speed to running an order of magnitude faster. An auto-tuning tool for VCR is developed for optimal performance and accuracy.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
SWI-Prolog's Home. Available from http://www.swi-prolog.org/.
 
2
Intel Math Kernel Library Reference Manual, March 2007. Intel Document number: 630813-025US.
 
3
4
 
5
 
6
O. Bachmann. Chains of Recurrences. PhD thesis, Kent State University, College of Arts and Sciences, 1996.
7
 
8
 
9
L. Kylex. How to Avoid Bottlenecks in Simple Math Functions. Available from http://softwarecommunity.intel.com/articles/eng/3524.htm, Intel 2004.
 
10
R. van Engelen. Symbolic Evaluation of Chains of Recurrences for Loop Optimization. Technical report, TR-000102, Computer Science Dept., Florida State University, 2000.
 
11
12
 
13
14
 
15
J. E. Volder. The Cordic Trigonometric Computing Technique. IRE Transactions Electronic Computers, EC-8:330--334, September 1959.
 
16
A. K. Yuen. Intel's Floating--Point Processors. In Electro/88 Conference Record, pages 48/5/1--7, 1988.
 
17
 
18
19
 
20

Collaborative Colleagues:
Yixin Shou: colleagues
Robert A. van Engelen: colleagues