|
ABSTRACT
Delayed branching is a technique to alleviate branch hazards without expensive hardware branch prediction mechanisms. For VLIW processors with deep pipelines and many issue slots, the instruction scheduler faces the difficult problem of filling the many delay slots. This paper proposes two solutions: a code hoisting technique that produces more candidate operations to be put in the delay slots and an adapted backtracking instruction scheduler that is capable of efficiently placing these candidate operations in the delay slots. We have demonstrated that the two mechanisms work wellon various multimedia and SPECINT2000 benchmarks. The code hoisting technique reduces the schedule length of a traditional scheduler without backtracking by 18%. Using the backtracking scheduler, this amount increases to 24%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
B. De Sutter. General-purpose architecture instruction scheduling techniques. Technical report, ELIS, Universiteit Gent, Belgium, November 1998.
|
| |
3
|
|
 |
4
|
|
| |
5
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
6
|
The IMPACT Research Group, http://www.crhc.uiuc.edu/Impact/. The IMPACT Research Compiler, 1987.
|
| |
7
|
|
 |
8
|
|
| |
9
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
10
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
 |
11
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
| |
12
|
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Proc. of Field-Programmable Logic and Applications, pages 61--70, 2003.
|
| |
13
|
M. Schlansker, B. Rau, S. Mahlke, V. Kathail, R. Johnson, S. Anik, and S. Abraham. Achieving highlevels of instruction-level parallelism with reduced hardware complexity. Technical Report HPL-96-120, Hewlett Packard Laboratories, February 1997.
|
| |
14
|
Standard Performance Evaluation Corporation, http://www.spec.org. SPEC CPU2000, 2000.
|
| |
15
|
Tom Vander Aa , Murali Jayapala , Francisco Barat , Geert Deconinck , Rudy Lauwereins , Henk Corporaal , Francky Catthoor, Instruction buffering exploration for low energy embedded processors, Journal of Embedded Computing, v.1 n.3, p.341-351, August 2005
|
| |
16
|
T. Wiegand, G. J. Sullivan, G. Bjontegaard, andA. Luthra. Overview of the H.264/AVC video coding standard. IEEE Trans. on Circuits and Systems for Video Technology, 13(7):560--576, July 2003.
|
|