| Automatic SIMD vectorization of chains of recurrences |
| Full text |
Pdf
(272 KB)
|
Source
|
International Conference on Supercomputing
archive
Proceedings of the 22nd annual international conference on Supercomputing
table of contents
Island of Kos, Greece
SESSION: Code performance tuning
table of contents
Pages 245-255
Year of Publication: 2008
ISBN:978-1-60558-158-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 73, Citation Count: 0
|
|
|
ABSTRACT
Many computational tasks require repeated evaluation of functions over structured grids, such as plotting in a coordinate system, rendering of parametric objects in 2D and 3D, numerical grid generation, and signal processing. In this paper, we present a method and toolset to speed up closed-form function evaluations over grids by vectorizing Chains of Recurrences (CR). CR forms of closed-form functions require fewer operations to evaluate per grid point. However, the present CR formalism makes CR forms inherently non-vectorizable due to the dependences carried from one point to the next. To address this limitation, we developed a new decoupling method for the CR algebra to translate math functions into Vector Chains of Recurrences (VCR) forms. The VCR coefficients are packed in short vector registers for efficient execution. Performance results of benchmark functions evaluated in single and double precision VCR forms are compared to the Intel compiler's auto-vectorized code and the high-performance small vector math library (SVML). The results show a significant performance increase of our VCR method over SVML and scalar CRs, from doubling the execution speed to running an order of magnitude faster. An auto-tuning tool for VCR is developed for optimal performance and accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
SWI-Prolog's Home. Available from http://www.swi-prolog.org/.
|
| |
2
|
Intel Math Kernel Library Reference Manual, March 2007. Intel Document number: 630813-025US.
|
| |
3
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
4
|
|
| |
5
|
|
| |
6
|
O. Bachmann. Chains of Recurrences. PhD thesis, Kent State University, College of Arts and Sciences, 1996.
|
 |
7
|
|
| |
8
|
|
| |
9
|
L. Kylex. How to Avoid Bottlenecks in Simple Math Functions. Available from http://softwarecommunity.intel.com/articles/eng/3524.htm, Intel 2004.
|
| |
10
|
R. van Engelen. Symbolic Evaluation of Chains of Recurrences for Loop Optimization. Technical report, TR-000102, Computer Science Dept., Florida State University, 2000.
|
| |
11
|
|
 |
12
|
Robert A. van Engelen , J. Birch , Y. Shou , B. Walsh , Kyle A. Gallivan, A unified framework for nonlinear dependence testing and symbolic analysis, Proceedings of the 18th annual international conference on Supercomputing, June 26-July 01, 2004, Malo, France
[doi> 10.1145/1006209.1006226]
|
| |
13
|
|
 |
14
|
Robert van Engelen , Lex Wolters , Gerard Cats, CTADEL: a generator of multi-platform high performance codes for PDE-based scientific applications, Proceedings of the 10th international conference on Supercomputing, p.86-93, May 25-28, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237578.237589]
|
| |
15
|
J. E. Volder. The Cordic Trigonometric Computing Technique. IRE Transactions Electronic Computers, EC-8:330--334, September 1959.
|
| |
16
|
A. K. Yuen. Intel's Floating--Point Processors. In Electro/88 Conference Record, pages 48/5/1--7, 1988.
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
|