ACM Home Page
Please provide us with feedback. Feedback
Versatility of extended subwords and the matrix register file
Full text PdfPdf (1.79 MB)
Source
ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 5 ,  Issue 1  (May 2008) table of contents
Article No. 5  
Year of Publication: 2008
ISSN:1544-3566
Authors
Asadollah Shahbahrami  Delft University of Technology, The Netherlands
Ben Juurlink  Delft University of Technology, The Netherlands
Stamatis Vassiliadis  Delft University of Technology, The Netherlands
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 68,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1369396.1369401
What is a DOI?

ABSTRACT

Extended subwords and the matrix register file (MRF) are two micro architectural techniques that address some of the limitations of existing SIMD architectures. Extended subwords are wider than the data stored in memory. Specifically, for every byte of data stored in memory, there are four extra bits in the media register file. This avoids the need for data-type conversion instructions. The MRF is a register file organization that provides both conventional row-wise, as well as column-wise, access to the register file. In other words, it allows to view the register file as a matrix in which corresponding subwords in different registers corresponds to a column of the matrix. It was introduced to accelerate matrix transposition which is a very common operation in multimedia applications. In this paper, we show that the MRF is very versatile, since it can also be used for other permutations than matrix transposition. Specifically, it is shown how it can be used to provide efficient access to strided data, as is needed in, e.g., color space conversion. Furthermore, it is shown that special-purpose instructions (SPIs), such as the sum-of-absolute differences (SAD) instruction, have limited usefulness when extended subwords and a few general SIMD instructions that we propose are supported, for the following reasons. First, when extended subwords are supported, the SAD instruction provides only a relatively small performance improvement. Second, the SAD instruction processes 8-bit subwords only, which is not sufficient for quarter-pixel resolution nor for cost functions used in image and video retrieval. Results obtained by extending the SimpleScalar toolset show that the proposed techniques provide a speedup of up to 3.00 over the MMX architecture. The results also show that using, at most, 13 extra media registers yields an additional performance improvement ranging from 1.38 to 1.57.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Baron, M. 2005. Cortex-A8: High speed, low power. Microprocessor Rep. 11, 14, 1--6.
 
3
Bartkowiak, M. 2001. Optimizations of color transformation for real time video decoding. In Proceedings of the EURASIP Conference on Digital Signal Processing for Multimedia Communications and Services.
 
4
Bensaali, F. and Amira, A. 2005. Accelerating colour space conversion on reconfigurable hardware. Image Vision Comput. 23, 935--942.
 
5
 
6
 
7
 
8
Flachs, B., Asano, S., Dhong, S. H., Hofstee, H. P., Gervais, G., Kim, R., Le, T., Liu, P., Leenstra, J., Michael, J. L. B., Oh, H. J., Mueller, S. M., Takahashi, O., Hatakeyama, A., Watanabe, Y., Yano, N., Brokenshire, D. A., Peyravian, M., Vandung, T., and Iwata, E. 2006. The microarchitecture of the synergistic processor for a cell processor. IEEE J. Solid-State Circuits 41, 63--70.
 
9
 
10
 
11
Gwennap, L. 1996. Digital, MIPS add multimedia extensions. Microprocessor Rep. 10, 15, 24--28.
 
12
 
13
IBM 2007. Synergistic Processor Unit Instruction Set Architecture. IBM. Version 1.2.
 
14
 
15
Juurlink, B., Borodin, D., Meeuws, R. J., Aalbers, G. T., and Leisink, H. 2007. The SimpleScalar Instruction Tool (SSIT) and the SimpleScalar Architecture Tool (SSAT). Available via http://ce.et.tudelft.nl/~shahbahrami
 
16
Kozyrakis, C., Gebis, J., Martin, D., Williams, S., Mavroidis, I., Pope, S., Jones, D., Patterson, D., and Yelick, K. 2000. Vector IRAM: A media-oriented vector processor with embedded DRAM. In Proceedings of the 12th International Conference on Hot Chips.
 
17
18
 
19
Lee, A. J. T., Hong, R. W., and Chang, M. F. 2004. An approach to content-based video retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo. Vol. 1. 273--276.
 
20
 
21
 
22
 
23
Motorola Inc. 1998. AltiVec Technology Programming Environments Manual. Motorola Inc. Rev.0.1.
24
25
 
26
 
27
 
28
 
29
Seshan, N. 1998. High VelociTI Processing. IEEE Signal Processing Mag. 15, 2, 86--101.
 
30
 
31
32
 
33
Shanableh, T. and Ghanbari, M. 2000. Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats. IEEE Trans. Multimedia 2, 2, 101--110.
 
34
 
35
Tamhankar, A. and Rao, K. R. 2003. An overview of H.264/MPEG-4 Part 10. In Proceedings of the 4th International Conference on Video and Image Processing and Multimedia Communications. 1--51.
 
36
Texas Instruments 2007. TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide. Texas Instruments. Literature Number: SPRU732D.
 
37
 
38
 
39
Zhang, D. and Lu, G. 2003. Evaluation of similarity measurement for image rretrieval. In Proceedings of the IEEE International Conference on Neural Networks and Signal Processing. Vol. 2. 928--931.

Collaborative Colleagues:
Asadollah Shahbahrami: colleagues
Ben Juurlink: colleagues
Stamatis Vassiliadis: colleagues