ACM Home Page
Please provide us with feedback. Feedback
Efficient orchestration of sub-word parallelism in media processors
Full text PdfPdf (230 KB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures table of contents
Barcelona, Spain
SESSION: Shared memory and architecture table of contents
Pages: 225 - 234  
Year of Publication: 2004
ISBN:1-58113-840-7
Authors
John Oliver  University of California at Davis, CA
Venkatesh Akella  University of California at Davis, CA
Frederic Chong  University of California at Davis, CA
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 22,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007912.1007946
What is a DOI?

ABSTRACT

Communication and multimedia applications with increased data rates and enhanced functionality continuously raise the bar for the computational requirements of future microprocessors. In order to meet these computational demands it is necessary to exploit sub-word parallelism efficiently. We propose to make sub-word data movement a first-class operation in microprocessor architectures by introducing a Sub-word Permutation Unit (SPU)in the execution pipeline. The SPU is evaluated in the context of the MMX media co-processor for the Intel Pentium architectures, but our results can be extended to any processor that supports sub-word parallelism. We find that the SPU all ws us to orchestrate sub-word data placement prior to computation, thus all wing the MMX functional units to concentrate on performing calculations. Furthermore, we introduce a decoupled SPU control mechanism at the basic block level which allows static optimization to eliminate data-movement verhead in tight loops, where most media and signal processing occurs. We demonstrated that anywhere from 4% to 20% improvement can be obtained on key media and signal processing kernels with as little as 1% increase in hardware resources.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Virtual press kit: Intel Pentium 4 processor. http://www.intel.com/pressroom/archive/photos/p4_photos.htm.
 
2
 
3
S. Dutta, K. Connor, W. Wolf, and A. Wolfe. A Design Study of a 0.25um Video Signal Processor. IEEE Transactions on Circuits and Systems for Vide Technology, 8:501--519, august 1998.
 
4
J. Fridman. Subword parallelism in digital signal processing. IEEE Signal Processing Magazine, 17(2):270--35, march 2000.
 
5
 
6
S. R. Gerrit Slavenburg and H. Dijkstra. The TriMedia TM-1 PCI VLIW Media Processor. In Proceedings of the HotChips 8: A Symposium on High Performance Chips, august 1996.
 
7
 
8
 
9
Intel. Vtune performance analyzers. http://www.intel.com/software/prodcuts/vtune/.
 
10
IPP Intel. Intel Integrated Performance Primitives for Intel Pentium Processors and Intel Itanium Architectures. http://www.intel.com/software/rodcuts/ip/ip30/.
 
11
 
12
 
13
D.J. Kuck and R. A. Stokes. The Burroughs Scientific Processor (BSP). IEEE Transaction on Computers, 31:363--376, may 1982.
 
14
 
15
R. B. Lee. Multimedia extensions for general-purpose processors. In IEEE Workshop on Signal Processing Systems, pages 9--23, november 1997.
16
17
18
 
19
D. O. Michael Kagan, Simcha Gochman and D. Lin. MMX microarchitecture of Pentium rocessors with MMX technology and Pentium II microprocessors. (Q3):8, 1997.
 
20
 
21
N. Seshan. High VelociTI Processing. IEEE Signal Processing Magazine, pages 86--101, march 1998.
 
22
D. Talla. Architectural techniques to accelerate multimedia applications on general-purpose processors, 2001.
 
23
 
24
 
25
W. Wulf. Compilers and Computer Architecture. IEEE Computers, pages 41--48, July 1981.


Collaborative Colleagues:
John Oliver: colleagues
Venkatesh Akella: colleagues
Frederic Chong: colleagues

Peer to Peer - Readers of this Article have also read: