| Reduced code size modulo scheduling in the absence of hardware support |
| Full text |
Publisher Site
,
Pdf
(1.17 MB)
|
| Source
|
International Symposium on Microarchitecture
archive
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
table of contents
Istanbul, Turkey
SESSION: Compiler scheduling
table of contents
Pages: 99 - 110
Year of Publication: 2002
ISBN ~ ISSN:1072-4451 , 0-7695-1859-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
IEEE Computer Society Press
Los Alamitos, CA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 12, Citation Count: 3
|
|
|
ABSTRACT
Modulo scheduling is a very effective instruction scheduling technique that exploits Instruction Level Parallelism (ILP) in loop bodies by overlapping the execution of successive iterations. Unfortunately, modulo scheduling has been shown to cause heavy code expansion. To avoid the penalties of code expansion, some processors have dedicated hardware support for modulo scheduled loops. However, this dedicated hardware support has a cost in chip area, cycle time, processor complexity, and compiler complexity.This paper shows that the right combination of scheduling heuristics combined with speculative modulo scheduling can significantly reduce code expansion. In addition, several code generation schema heuristics are proposed to further reduce code expansion. The evaluations show that loops can be effectively modulo scheduled with an average code expansion only 1.5 times the original loop size. Compared with a state of the art modulo scheduler, our code size sensitive heuristics reduce the size of embedded domain benchmarks binaries by 30% on average. While performance is mostly unchanged, some applications show speed-ups up to 20% due to a reduction in instruction cache capacity misses.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
James C. Dehnert , Peter Y.-T. Hsu , Joseph P. Bratt, Overlapped loop support in the Cydra 5, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.26-38, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
5
|
|
| |
6
|
|
| |
7
|
Equator technologies. MAP1000 unfolds at Equator. Microprocessor report, 129160, Dec. 1998.
|
 |
8
|
Paolo Faraboschi , Geoffrey Brown , Joseph A. Fisher , Giuseppe Desoli , Fred Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proceedings of the 27th annual international symposium on Computer architecture, p.203-213, June 2000, Vancouver, British Columbia, Canada
|
| |
9
|
|
 |
10
|
|
| |
11
|
Intel Corporation, Intel IA-64 Architecture Software Developer's Manual Volume 1: Application Architecture, Jan. 2000.
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pp. 39--45, July 1992.
|
 |
17
|
|
 |
18
|
|
 |
19
|
B. Ramakrishna Rau , Michael S. Schlansker , P. P. Tirumalai, Code generation schema for modulo scheduled loops, Proceedings of the 25th annual international symposium on Microarchitecture, p.158-169, December 01-04, 1992, Portland, Oregon, United States
|
| |
20
|
Texas Instruments. TMS320C6000 CPU and instruction set reference guide. March 1999.
|
| |
21
|
|
 |
22
|
Nancy J. Warter , Grant E. Haab , Krishna Subramanian , John W. Bockhaus, Enhanced modulo scheduling for loops with conditional branches, Proceedings of the 25th annual international symposium on Microarchitecture, p.170-179, December 01-04, 1992, Portland, Oregon, United States
|
CITED BY 3
|
|
Hongbo Rong , Alban Douillet , R. Govindarajan , Guang R. Gao, Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.175, March 20-24, 2004, Palo Alto, California
|
|
|
|
|
|
Kevin Fan , Manjunath Kudlur , Hyunchul Park , Scott Mahlke, Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.219-232, November 12-16, 2005, Barcelona, Spain
|
|