|
ABSTRACT
Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph.We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35% speedup.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
Intel Corporation. Intel® C++ Compiler for Linux* Systems User's Guide, 2003.
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
Dorit Naishlos , Marina Biberstein , Shay Ben-David , Ayal Zaks, Vectorizing for a SIMdD DSP architecture, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, October 30-November 01, 2003, San Jose, California, USA
[doi> 10.1145/951710.951714]
|
CITED BY 11
|
|
|
|
|
|
|
|
Manuel Hohenauer , Christoph Schumacher , Rainer Leupers , Gerd Ascheid , Heinrich Meyr , Hans van Someren, Retargetable code optimization with SIMD instructions, Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, October 22-25, 2006, Seoul, Korea
|
|
|
Tanaka Hiroaki , Yoshinori Takeuchi , Keishi Sakanushi , Masaharu Imai , Yutaka Ota , Nobu Matsumoto , Masaki Nakagawa, Pack instruction generation for media pUsing multi-valued decision diagram, Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
Stefan Kraemer , Rainer Leupers , Gerd Ascheid , Heinrich Meyr, Interactive presentation: SoftSIMD - exploiting subword parallelism using source code transformations, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
|
|
|
|
|
|
|
|
|
|
|
Hiroaki Tanaka , Yoshinori Takeuchi , Keishi Sakanushi , Masaharu Imai , Hiroki Tagawa , Yutaka Ota , Nobu Matsumoto, Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, v.E90-A n.12, p.2800-2809, December 2007
|
|