ACM Home Page
Please provide us with feedback. Feedback
Auto-vectorization of interleaved data for SIMD
Full text PdfPdf (193 KB)
Source Conference on Programming Language Design and Implementation archive
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation table of contents
Ottawa, Ontario, Canada
SESSION: Parallelism table of contents
Pages: 132 - 143  
Year of Publication: 2006
ISBN:1-59593-320-4
Also published in ...
Authors
Dorit Nuzman  IBM Haifa Labs, Haifa, Israel
Ira Rosen  IBM Haifa Labs, Haifa, Israel
Ayal Zaks  IBM Haifa Labs, Haifa, Israel
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1133981.1133997
What is a DOI?

ABSTRACT

Most implementations of the Single Instruction Multiple Data (SIMD) model available today require that data elements be packed in vector registers. Operations on disjoint vector elements are not supported directly and require explicit data reorganization manipulations. Computations on non-contiguous and especially interleaved data appear in important applications, which can greatly benefit from SIMD instructions once the data is reorganized properly. Vectorizing such computations efficiently is therefore an ambitious challenge for both programmers and vectorizing compilers. We demonstrate an automatic compilation scheme that supports effective vectorization in the presence of interleaved data with constant strides that are powers of 2, facilitating data reorganization. We demonstrate how our vectorization scheme applies to dominant SIMD architectures, and present experimental results on a wide range of key kernels, showing speedups in execution time up to 3.7 for interleaving levels (stride) as high as 8.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian. Efficient exploitation of parallelism on Pentium III and Pentium 4 processor-based systems. Intel Technology J., February 2001.
 
5
 
6
 
7
P. D'Arcy and S. Beach. StarCore SC140: A New DSP Architecture for Portable Devices. In Wireless Symposium. Motorola, September 1999.
 
8
9
10
 
11
Free Software Foundation. Auto-Vectorization in GCC, http://gcc.gnu.org/projects/tree-ssa/vectorization.html.
 
12
Free Software Foundation. GCC, http://gcc.gnu.org.
13
 
14
Texas Instruments. www.ti.com/sc/c6x, 2000.
 
15
16
17
 
18
J. Lorenz, S. Kral, F. Franchetti, and C. W. Ueberhuber. Vectorization Techniques for the BlueGene/L Double FPU. IBM Journal of Research and Development, 49(2-3), pages 437--446, March/May 2005.
 
19
J. Merrill. Generic and Gimple: A New Tree Representation for Entire Functions. In the GCC Developer's summit, pages 171--180, June 2003.
 
20
21
 
22
23
 
24
D. Novillo. Tree SSA - a New Optimization Infrastructure for GCC. In Proc. of the GCC Developers Summit, pages 181--194, June 2003.
 
25
 
26
 
27
S. Pop, G. Silber, A. Cohen, P. Clauss, and V. Loechner. Fast Recognition of Scalar Evolutions on Three-address SSA Code. Research Report A/354/CRI, CRI/ENSMP, April 2004.
 
28
S. Pop, A. Cohen, and G. Silber. Induction Variable Analysis with Delayed Abstractions. In Proc. of the First International Conference of High Performance Embedded Architectures and Compilers (HiPEAC), pages 218--232, November 2005.
 
29
I. Pryanishnikov, A. Krall, and N. Horspool. Pointer Alignment Analysis for Processors with SIMD Instructions. In Proc. of the 5th Workshop on Media and Streaming Processors at Micro '03, pages 50--57, December 2003.
 
30
G. Ren, P. Wu, and D. Padua. A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions. In 16th International Workshop of Languages and Compilers for Parallel Computing (LCPC), pages 420 -- 435, October 2003.
31
 
32
 
33
 
34
K. B. Smith, A. J. Bik, and X. Tian. Support for the Intel Pentium 4 Processor with Hyper-threading Technology in Intel 8.0 Compilers. Intel Technology Journal, 8(1), pages 19--31, February 2004.
 
35
 
36
Crecent Bay Software. VAST-F/ALtivec: Automatic Fortran Vectorizer for PowerPC Vector Unit, http://www.crescentbaysoftware.com/docs/vastfav.pdf.
 
37
Crecent Bay Software. Vast/altivec faq: Vectorization for Altivec, http://www.crescentbaysoftware.com/altivec_FAQ.html.
 
38
 
39

CITED BY  7

Collaborative Colleagues:
Dorit Nuzman: colleagues
Ira Rosen: colleagues
Ayal Zaks: colleagues