ACM Home Page
Please provide us with feedback. Feedback
Complementing software pipelining with software thread integration
Full text PdfPdf (331 KB)
Source ACM SIGPLAN Notices archive
Volume 40 ,  Issue 7  (July 2005) table of contents
Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
SESSION: Hardware supported optimization table of contents
Pages: 137 - 146  
Year of Publication: 2005
ISSN:0362-1340
Also published in ...
Authors
Won So  North Carolina State University, Raleigh, NC
Alexander G. Dean  North Carolina State University, Raleigh, NC
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1070891.1065930
What is a DOI?

ABSTRACT

Software pipelining is a critical optimization for producing efficient code for VLIW/EPIC and superscalar processors in high-performance embedded applications such as digital signal processing. Software thread integration (STI) can often improve the performance of looping code in cases where software pipelining performs poorly or fails. This paper examines both situations, presenting methods to determine what and when to integrate.We evaluate our methods on C-language image and digital signal processing libraries and synthetic loop kernels. We compile them for a very long instruction word (VLIW) digital signal processor (DSP) -- the Texas Instruments (TI) C64x architecture. Loops which benefit little from software pipelining (SWP-Poor) speed up by 26% (harmonic mean, HM). Loops for which software pipelining fails (SWP-Fail) due to conditionals and calls speed up by 16% (HM). Combining SWP-Good and SWP-Poor loops leads to a speedup of 55% (HM).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
 
5
 
6
 
7
K. D. Cooper, M. W. Hall, and K. Kennedy. A methodology for procedure cloning. Computer Languages, 19(2):105--117, 1993.
 
8
 
9
 
10
11
12
 
13
14
 
15
E. Granston, R. Scales, E. Stotzer, A. Ward, and J. Zbiciak. Controlling code size of software-pipelined loops on the TMS320C6000 VLIW DSP architecture. In Proceedings of the 3rd Workshop on Media and Stream Processors, Dec. 2001.
 
16
N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proceedings of the IEEE, 79(9):1305--1320, September 1991.
 
17
 
18
19
 
20
21
 
22
 
23
M. Narayanan and K. A. Yelick. Generating permutation instructions from a high-level description. In Proceedings of the 6th Workshop on Media and Streaming Processors, 2004.
 
24
A. Nene, S. Talla, B. Goldberg, and R. Rabbah. Trimaran - an infrastructure for compiler research in instruction-level parallelism - user manual. New York University, 1998.
 
25
26
 
27
 
28
 
29
R. Stephens. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.
 
30
31
32
 
33
 
34
Texas Instruments. Code Composer Studio User's Guide (Rev. B), Mar. 2000.
 
35
Texas Instruments. TMS320C6000 CPU and Instruction Set Reference Guide, Sept. 2000.
 
36
Texas Instruments. TMS320C64x Technical Overview, Jan. 2001.
 
37
Texas Instruments. TMS320C64x DSP Library Programmer's Reference, Apr. 2002.
 
38
Texas Instruments. TMS320C64x Image/Video Processing Library Programmer's Reference, Apr. 2002.
 
39
Texas Instruments. TMS320C6000 DSP Peripherals Overview Reference Guide (Rev. G), Sept. 2004.
 
40
41
42
 
43


Collaborative Colleagues:
Won So: colleagues
Alexander G. Dean: colleagues