| Improving superword level parallelism support in modern compilers |
| Full text |
Pdf
(332 KB)
|
| Source
|
International Conference on Hardware Software Codesign
archive
Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
table of contents
Jersey City, NJ, USA
SESSION: SW vs. HW acceleration techniques
table of contents
Pages: 303 - 308
Year of Publication: 2005
ISBN:1-59593-161-9
|
|
Authors
|
|
Christian Tenllado
|
Universidad Complutense, Madrid, Spain
|
|
Luis Piñuel
|
Universidad Complutense, Madrid, Spain
|
|
Manuel Prieto
|
Universidad Complutense, Madrid, Spain
|
|
Francisco Tirado
|
Universidad Complutense, Madrid, Spain
|
|
F. Catthoor
|
Interuniversity MicroElectronic Center (IMEC), Leuven, Belgium
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 31, Citation Count: 2
|
|
|
ABSTRACT
Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology do not allow for an efficient exploitation of the inherent data parallelism available in many signal processing and multimedia applications. In this paper, we have explored the automatic vectorization of embedded applications. In particular, we have focused on algorithms in which the same computations are applied over a set of signals that are being processed simultaneously. Usually this set of signals is represented as a 2D array in which each row is an input signal that has to be filtered in some way. A motivating example, inspired by VoIP processing, illustrates that state-of-the-art vectorizing compilers inefficiently exploit the data parallelism inherent to this kind of applications. One of the main reasons behind this, is that they present inner loops that carry all the dependencies and external loops with strided memory accesses.We propose a modification of the Superword Level Parallelism (SLP) compiler, proposed in [9], that tries to overcome these problems. Experimental results show that our approach clearly outperforms commercial compilers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Arm11 family. http://www.arm.com/products/CPUs/families/ARM11Family.html.
|
| |
2
|
A. Bik, M. Girkar, P. Grey, and X. Tian. Efficient exploitation of parallelism on pentium iii and pentium 4 processor-based systems. Intel Technology Journal, 2001.
|
| |
3
|
I. Corpation. Intel c/c++ and intel fortran compilers for linux. Available at http://www.intel.com/software/products/compilers.
|
| |
4
|
S. Fuller. Motorola's AltiVec technology. Technical Report ALTIVECWP/D, MOTOROLA, 1998.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
K. Krewell. Cell moves into the limelight. Microprocessor Report, (2/14/05-01), February 2005.
|
 |
9
|
|
| |
10
|
S. Larsen, E. Witchel, and S. Amarasinghe. Techniques for increasing and detecting memory alignment. Technical Report MIT-LCS-TM-621, MIT, USA, 2001.
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
|