ACM Home Page
Please provide us with feedback. Feedback
Reduced code size modulo scheduling in the absence of hardware support
Full text Publisher SitePublisher Site PdfPdf (1.17 MB)
Source International Symposium on Microarchitecture archive
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture table of contents
Istanbul, Turkey
SESSION: Compiler scheduling table of contents
Pages: 99 - 110  
Year of Publication: 2002
ISBN ~ ISSN:1072-4451 , 0-7695-1859-1
Authors
Josep Llosa  Universitat Politècnica de Catalunya
Stefan M. Freudenberger  Hewlett-Packard Laboratories, Cambridge, Mass
Sponsors
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
: IEEE TC-uArch
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 12,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Modulo scheduling is a very effective instruction scheduling technique that exploits Instruction Level Parallelism (ILP) in loop bodies by overlapping the execution of successive iterations. Unfortunately, modulo scheduling has been shown to cause heavy code expansion. To avoid the penalties of code expansion, some processors have dedicated hardware support for modulo scheduled loops. However, this dedicated hardware support has a cost in chip area, cycle time, processor complexity, and compiler complexity.This paper shows that the right combination of scheduling heuristics combined with speculative modulo scheduling can significantly reduce code expansion. In addition, several code generation schema heuristics are proposed to further reduce code expansion. The evaluations show that loops can be effectively modulo scheduled with an average code expansion only 1.5 times the original loop size. Compared with a state of the art modulo scheduler, our code size sensitive heuristics reduce the size of embedded domain benchmarks binaries by 30% on average. While performance is mostly unchanged, some applications show speed-ups up to 20% due to a reduction in instruction cache capacity misses.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
 
6
 
7
Equator technologies. MAP1000 unfolds at Equator. Microprocessor report, 129160, Dec. 1998.
8
 
9
10
 
11
Intel Corporation, Intel IA-64 Architecture Software Developer's Manual Volume 1: Application Architecture, Jan. 2000.
12
 
13
 
14
 
15
 
16
S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pp. 39--45, July 1992.
17
18
19
 
20
Texas Instruments. TMS320C6000 CPU and instruction set reference guide. March 1999.
 
21
22


Collaborative Colleagues:
Josep Llosa: colleagues
Stefan M. Freudenberger: colleagues