|
ABSTRACT
Retargetable C compilers are currently widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. A partially inherent problem of the retargetable compilation approach, though, is the limited code quality as compared to hand-written compilers or assembly code due to the lack of dedicated optimizations techniques. This problem can be circumvented by designing flexible, retargetable code optimization techniques that apply to a certain range of target architectures. This article focuses on target machines with SIMD instruction support, a common feature in embedded processors for multimedia applications. However, SIMD optimization is known to be a difficult task since SIMD architectures are largely nonuniform, support only a limited set of data types and impose several memory alignment constraints. Additionally, such techniques require complicated loop transformations, which are tailored to the SIMD architecture in order to exhibit the necessary amount of parallelism in the code. Thus, integrating the SIMD optimization and the required loop transformations together in a single retargeting formalism is an ambitious challenge. In this article, we present an efficient and quickly retargetable SIMD code optimization framework that is integrated into an industrial retargetable C compiler. Experimental results for different processors demonstrate that the proposed technique applies to real-life target machines and that it produces code quality improvements close to the theoretical limit.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Associated Computer Experts (ACE). The COSY compiler development system. http://www.ace.nl.
|
| |
2
|
Advanced RISC Machines Ltd. The ARM11 processor. http://www.arm.com.
|
 |
3
|
J. R. Allen , Ken Kennedy , Carrie Porterfield , Joe Warren, Conversion of control dependence to data dependence, Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, p.177-189, January 24-26, 1983, Austin, Texas
[doi> 10.1145/567067.567085]
|
 |
4
|
|
| |
5
|
Cheong, G. and Lam, M. S. 1997. An optimizer for multimedia instruction sets. In Proceedings of the 2nd SUIF Compiler Workshop. Stanford University, CA.
|
| |
6
|
Coware Inc. Processor Designer. http://www.coware.com.
|
 |
7
|
|
 |
8
|
|
| |
9
|
Franchetti, F., Kral, S., Lorenz, J., and Ueberhuber, C. W. 2005. Efficient utilization of SIMD extensions. Proc. IEEE. 93, 2, 409--425.
|
 |
10
|
|
 |
11
|
|
| |
12
|
Glöckler, T., Bitterlich, S., and Meyr, H. 2000. ICORE: a low-power application specific instruction set processor for DVB-T acquisition and tracking. In Proceedings of the 13th Annual IEEE International ASIC/SOC Conference. IEEE, Los Alamitos, CA.
|
| |
13
|
GNU Compiler Collection. Auto-vectorization in GCC. http://gcc.gnu.org/projects/tree-ssa/vectorization.html.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Manuel Hohenauer , Hanno Scharwaechter , Kingshuk Karuri , Oliver Wahlen , Tim Kogel , Rainer Leupers , Gerd Ascheid , Heinrich Meyr , Gunnar Braun , Hans van Someren, A Methodology and Tool Suite for C Compiler Generation from ADL Processor Models, Proceedings of the conference on Design, automation and test in Europe, p.21276, February 16-20, 2004
|
 |
18
|
Manuel Hohenauer , Christoph Schumacher , Rainer Leupers , Gerd Ascheid , Heinrich Meyr , Hans van Someren, Retargetable code optimization with SIMD instructions, Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, October 22-25, 2006, Seoul, Korea
[doi> 10.1145/1176254.1176291]
|
| |
19
|
Intel Corporation. Intel C compiler. http://www.intel.com.
|
 |
20
|
Akira Kitajima , Makiko Itoh , Jun Sato , Akichika Shiomi , Yoshinori Takeuchi , Masaharu Imai, Effectiveness of the ASIP design system PEAS-III in design of pipelined processors, Proceedings of the 2001 conference on Asia South Pacific design automation, p.649-654, January 2001, Yokohama, Japan
[doi> 10.1145/370155.370573]
|
| |
21
|
|
 |
22
|
Alexei Kudriavtsev , Peter Kogge, Generation of permutations for SIMD processors, Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 15-17, 2005, Chicago, Illinois, USA
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
NXP Semiconductors. The TriMedia media processor. http://www.nxp.com.
|
| |
33
|
Oraioglu, A. and Veidenbaum, A. 2003. Application specific microprocessors (Guest Editors' Introduction). IEEE Des.Test Comput. 20.
|
| |
34
|
|
| |
35
|
Pryanishnikov, I., Krall, A., and Horspool, N. 2003. Pointer alignment analysis for processors with SIMD instructions. In Proceedings of the 5th Workshop on Media and Streaming Processors. ACM, New York.
|
| |
36
|
Ren, G., Wu, P., and Padua, D. 2003. A preliminary study on the vectorization of multimedia applications for multimedia extensions. In Proceedings of the 16th International Workshop of Languages and Compilers for Parallel Computing. Springer, Berlin, Germany.
|
 |
37
|
|
| |
38
|
Rizzolo, N. and Padua, D. 2005. HiLO: high level optimization of FFTs. In Proccedings of the 18th International Conference on Languages and Compilers for High Performance Computing. Springer, Berlin, Germany, 238--252.
|
| |
39
|
Tensilica, Inc. Xtensa C compiler. http://www.tensilica.com.
|
| |
40
|
|
| |
41
|
|
| |
42
|
Zivojnovic, V., Velarde, J., Schläger, C., and Meyr, H. 1994. DSPStone—a DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT). IASTED, Calgary, Alberta.
|
|