ACM Home Page
Please provide us with feedback. Feedback
Loop fusion for clustered VLIW architectures
Full text PdfPdf (112 KB)
Source Language, Compiler and Tool Support for Embedded Systems archive
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems table of contents
Berlin, Germany
SESSION: Code Generation table of contents
Pages: 112 - 119  
Year of Publication: 2002
ISBN:1-58113-527-0
Also published in ...
Authors
Yi Qian  Michigan Technological University, Houghton MI
Steve Carr  Michigan Technological University, Houghton MI
Philip Sweany  Texas Instruments, Dallas, TX
Sponsor
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 46,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/513829.513850
What is a DOI?

ABSTRACT

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level parallelism (ILP), and thus significantly improve performance. However, software pipelining, in some instances, hinders the goals of low power consumption and low chip cost. Specifically, the registers required by a software pipelined loop may exceed the size of the physical register set.The register pressure problem incurred by software pipelining makes it difficult to build a high-performance embedded processor with a single, multi-ported register bank with enough registers to support high levels of ILP while maintaining clock speed and limiting power consumption. The large number of ports required to support a single register bank severely hampers access time. The port requirement for a register bank can be reduced via hardware by partitioning the register bank into multiple banks connected to disjoint subsets of functional units, called clusters. Since a functional unit is not directly connected to all register banks, wasted energy and resources can result due to delays incurred when accessing "non-local" registers.The overhead due to partitioning of the register set can be ameliorated by using high-level compiler loop optimization techniques such as unrolling, unroll-and-jam and fusion. High-level loop optimizations spread data-independent parallelism across clusters that may not require "non-local" register accesses and can provide work to hide the latency of any such register accesses that are needed.In this paper, we examine the effects of loop fusion on DSP loops run on four simulated, clustered VLIW architectures and the Texas Instruments TMS320C64x. Our experiments show a 1.3 -- 2 harmonic mean speedup.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
R. Allen and K. Kennedy. Advanced compilation for vector and parallel computers. Morgan Kaufmann Publishers, San Mateo CA
6
7
 
8
J. R. Ellis. A Compiler for VLIW Architectures. PhD thesis, Yale University, 1984
 
9
 
10
11
 
12
 
13
D. Kuras, S. Carr, and P. Sweany. Value cloning for architectures with partitioned register banks. In The 1998 Worshop on Compiler and Architecture Support for Embedded Systems, Washington D.C., December 1998
14
 
15
 
16
 
17
D. Poplawski. The unlimited resource machine (URM). Technical Report 95-01, Michigan Technological University, Jan. 1995
18
 
19
 
20
 
21
P. H. Sweany and S. J. Beaty. Overview of the Rocket retargetable C compiler. Technical Report CS-94-01, Department of Computer Science, Michigan Technological University, Houghton, January 1994
 
22
Texas Instruments. TMS320C6000 CPU and Instruction Set Reference Guide, 2000. literature number SPRU189
 
23
Texas Instruments. TMS320C6000 Optimizing Compiler User's Guide, 2000. literature number SPRU187

CITED BY  8

Collaborative Colleagues:
Yi Qian: colleagues
Steve Carr: colleagues
Philip Sweany: colleagues