ACM Home Page
Please provide us with feedback. Feedback
Improving superword level parallelism support in modern compilers
Full text PdfPdf (332 KB)
Source International Conference on Hardware Software Codesign archive
Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis table of contents
Jersey City, NJ, USA
SESSION: SW vs. HW acceleration techniques table of contents
Pages: 303 - 308  
Year of Publication: 2005
ISBN:1-59593-161-9
Authors
Christian Tenllado  Universidad Complutense, Madrid, Spain
Luis Piñuel  Universidad Complutense, Madrid, Spain
Manuel Prieto  Universidad Complutense, Madrid, Spain
Francisco Tirado  Universidad Complutense, Madrid, Spain
F. Catthoor  Interuniversity MicroElectronic Center (IMEC), Leuven, Belgium
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
SIGBED: ACM Special Interest Group on Embedded Systems
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 27,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1084834.1084909
What is a DOI?

ABSTRACT

Multimedia vector instruction sets are becoming ubiquitous in most of the embedded systems used for multimedia, networking and communications. However, current compiler technology do not allow for an efficient exploitation of the inherent data parallelism available in many signal processing and multimedia applications. In this paper, we have explored the automatic vectorization of embedded applications. In particular, we have focused on algorithms in which the same computations are applied over a set of signals that are being processed simultaneously. Usually this set of signals is represented as a 2D array in which each row is an input signal that has to be filtered in some way. A motivating example, inspired by VoIP processing, illustrates that state-of-the-art vectorizing compilers inefficiently exploit the data parallelism inherent to this kind of applications. One of the main reasons behind this, is that they present inner loops that carry all the dependencies and external loops with strided memory accesses.We propose a modification of the Superword Level Parallelism (SLP) compiler, proposed in [9], that tries to overcome these problems. Experimental results show that our approach clearly outperforms commercial compilers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Arm11 family. http://www.arm.com/products/CPUs/families/ARM11Family.html.
 
2
A. Bik, M. Girkar, P. Grey, and X. Tian. Efficient exploitation of parallelism on pentium iii and pentium 4 processor-based systems. Intel Technology Journal, 2001.
 
3
I. Corpation. Intel c/c++ and intel fortran compilers for linux. Available at http://www.intel.com/software/products/compilers.
 
4
S. Fuller. Motorola's AltiVec technology. Technical Report ALTIVECWP/D, MOTOROLA, 1998.
 
5
 
6
 
7
 
8
K. Krewell. Cell moves into the limelight. Microprocessor Report, (2/14/05-01), February 2005.
9
 
10
S. Larsen, E. Witchel, and S. Amarasinghe. Techniques for increasing and detecting memory alignment. Technical Report MIT-LCS-TM-621, MIT, USA, 2001.
 
11
 
12
13


Collaborative Colleagues:
Christian Tenllado: colleagues
Luis Piñuel: colleagues
Manuel Prieto: colleagues
Francisco Tirado: colleagues
F. Catthoor: colleagues