ACM Home Page
Please provide us with feedback. Feedback
Compiling for an indirect vector register architecture
Full text PdfPdf (320 KB)
Source
Conference On Computing Frontiers archive
Proceedings of the 5th conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Compilation table of contents
Pages 199-208  
Year of Publication: 2008
ISBN:978-1-60558-077-7
Authors
Dorit Nuzman  IBM Haifa Research Lab, Haifa, Israel
Mircea Namolaru  IBM Haifa Research Lab, Haifa, Israel
Ayal Zaks  IBM Haifa Research Lab, Haifa, Israel
Jeff H. Derby  IBM Corporation, Raleigh, NC, USA
Sponsors
ACM: Association for Computing Machinery
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 76,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1366230.1366266
What is a DOI?

ABSTRACT

The iVMX architecture contains a novel vector register file of up to 4096 vector registers accessed indirectly via a mapping mechanism, providing compatibility with the VMX architecture, and potential for dramatic performance benefits [7]. The large number of vector registers and the unique indirection mechanism pose compilation challenges to be used efficiently: the indirection mechanism emphasizes spatial locality of registers and interaction among destination and source operands during register allocation, and the many vector registers call for aggressive automatic vectorization.

This work is a first step in addressing the compilability of iVMX, following the presentation and validation of its architectural aspects [7]. In this paper we present several compilation approaches to deal with the mapping mechanism and an outer-loop vectorization transformation developed to promote the use of many vector registers. We modified an existing register allocator to target all available registers and added a post-pass to rename live-ranges considering spatial locality and interaction among operand types. An FIR filter is used to demonstrate the effectiveness of the techniques developed compared to a version hand-optimized for iVMX. Initial results show that we can reduce the overhead of map management down to 29% of the total instruction count, compared to 22% obtained manually, and compared to 49% obtained using a naive scheme, while outperforming an equivalent VMX implementation by a factor of 2.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
5
6
7
 
8
J. H. Derby and J. H. Moreno. A high-performance embedded dsp core with novel simd features. In ICASSP, 2003.
 
9
Free Software Foundation. gcc.gnu.org/projects/tree-ssa/vectorization.html.
 
10
Freescale Semiconductor, http://www.freescale.com. Altivec real fir, October 2002.
11
 
12
 
13
 
14
 
15
D. Naishlos. Autovectorization in gcc. In GCC Developer?s summit, pages 105--118, June 2004.
16
 
17
M. Namolaru. Register allocation techniques for ivmx architecture. In Int?l Workshop on GCC for Research in Embedded and Parallel Systems, September 2007.
 
18
D. Nuzman and A. Zaks. Autovectorization in gcc - two years later. In GCC Developer?s summit, June 2006.
 
19
 
20
 
21
 
22
 
23
 
24
C. Tenllado, L. Piñuel, M. Prieto, and F. Catthoor. Pack transposition: Enhancing superword level parallelism exploitation. In Parallel Computing, 2005.
25
 
26
27


Collaborative Colleagues:
Dorit Nuzman: colleagues
Mircea Namolaru: colleagues
Ayal Zaks: colleagues
Jeff H. Derby: colleagues