ACM Home Page
Please provide us with feedback. Feedback
Unroll-based register coalescing
Full text PdfPdf (1.25 MB)
Source International Conference on Supercomputing archive
Proceedings of the 14th international conference on Supercomputing table of contents
Santa Fe, New Mexico, United States
Pages: 296 - 305  
Year of Publication: 2000
ISBN:1-58113-270-0
Authors
Suhyun Kim  School of Electrical Engineering, Seoul National University, Seoul 151-742, Korea
Soo-Mook Moon  School of Electrical Engineering, Seoul National University, Seoul 151-742, Korea
Jinpyo Park  School of Electrical Engineering, Seoul National University, Seoul 151-742, Korea
Kemal Ebcioğlu  IBM T. J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 21,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/335231.335260
What is a DOI?

ABSTRACT

Aggressive instruction scheduling leaves behind many renaming copy instructions that cannot be coalesced due to interferences. These copies take resources, and more seriously, they may cause a stall if they are generated for renaming of multi-latency instructions. This paper proposes a code transformation technique based on loop unrolling which makes those copies coalescible. Two unique features of the technique are its method of determining the precise unroll amount based on an idea of extended live range, and its insertion of special bookkeeping copies at loop exits. In fact, the technique provides a more general and simpler solution for the cross-iteration register overwrite problem in software pipelining which works for loops with control flows as well as for straight-line loops. In addition, it is applicable to other optimizations including path length reduction and redundant subscripted reference elimination.Our empirical study performed on a 16-ALU VLIW testbed with a two-cycle load latency shows that 86% of the otherwise uncoalescible copies in innermost loops become coalescible when unrolled 2.2 times on average. In addition, it is demonstrated that the unroll amount obtained is precise and the most efficient. The unrolled version of the VLIW code includes fewer no-op VLIWs caused by stalls, improving the performance by a geometric mean of 18%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
6
7
 
8
 
9
10


Collaborative Colleagues:
Suhyun Kim: colleagues
Soo-Mook Moon: colleagues
Jinpyo Park: colleagues
Kemal Ebcioğlu: colleagues