|
ABSTRACT
As technology has advanced, the application space of Very Long Instruction Word (VLIW) processors has grown to include a variety of embedded platforms. Due to cost and power consumption constraints, many embedded VLIW processors contain limited resources, including registers. As a result, a VLIW compiler that maximizes instruction level parallelism (ILP) without considering register constraints may generate excessive register spills, leading to reduced overall system performance. To address this issue, this article presents a new spill reduction technique that improves VLIW runtime performance by reordering operations prior to register allocation and instruction scheduling. Unlike earlier algorithms, our approach explicitly considers both register reduction and data dependency in performing operation reordering. Data dependency control limits unexpected schedule length increases during subsequent instruction scheduling. Our technique has been evaluated using Trimaran, an academic VLIW compiler, and evaluated using a set of embedded systems benchmarks. Experimental results show that, on average, this technique improves VLIW performance by 10% for VLIW processors with 32 registers and 8 functional units compared with previous spill reduction techniques. Limited improvement is seen versus prior approaches for VLIW processors with 64 registers and 8 functional units.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Berson, D. A., Gupta, R., and Soffa, M. L. 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures. In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. Springer, Berlin, 243--254.
|
| |
2
|
Berson, D. A., Gupta, R., and Soffa, M. L. 1998. Integrated instruction scheduling and register allocation techniques. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, 247--262.
|
| |
3
|
Bouchez, F., Darte, A., and Rastello, F. 2007. On the complexity of spill everywhere under SSA form. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, 103--112.
|
| |
4
|
Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Department of Computer Science, Rice University.
|
| |
5
|
Briggs, P., Cooper, K., Kennedy, K., and Torczon, L. 1989. Coloring heuristics for register allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 275--284.
|
| |
6
|
Chaitin, G. 1982. Register allocation and spilling via graph Coloring. In Proceedings of the ACM SIGPLAN Symposium on Compiler Construction. ACM, New York, 98--105.
|
| |
7
|
Chakrapani, L. N., Gyllenhaal, J., Hwu, W. W., Mahlke, S. A., Palem, K. V., and Rabbah, R. M. 2004. Trimaran, an infrastructure for research in instruction level parallelism. In Proceedings of the International Workshop on Languages and Compilers for High-Performance Computing. ACM, New York, 32--41.
|
| |
8
|
Cilio, A. and Corporaal, H. 1999. Global program optimization: Register allocation of static scalar objects. In Proceedings of the Conference of the Advanced School for Computing and Imaging. 52--57.
|
| |
9
|
Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1990. Introduction to Algorithms. McGraw-Hill, New York.
|
| |
10
|
Dilworth, R. P. 1950. A decomposition theorem for partially ordered sets. Ann. Math. 51, 1, 161--166.
|
| |
11
|
Faraboschi, P., Brown, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the International Symposium on Microarchitecture. ACM, New York, 203--213.
|
| |
12
|
Freescale Semiconductor, Inc. 2005. MSC8101 Reference Manual. Freescale Semiconductor, Inc. http://www.datasheetcatalog.org/datasheets2/17/1767447_1.pdf
|
| |
13
|
Freudenberger, S. M. and Ruttenberg, J. C. 1991. Phase ordering of register allocation and instruction scheduling. In Proceedings of the International Workshop on Code Generation. ACM, New York, 146--172.
|
| |
14
|
Goodman, J. R. and Hsu, W.-C. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the ACM Super-Computing Conference. ACM, New York, 442--452.
|
| |
15
|
Goossens, G., Praet, J. V., Lanneer, D., and Geurts, W. 1997. Embedded software in real-time signal processing systems: Design technologies. Proc. IEEE 85, 3, 436--454.
|
| |
16
|
Govindarajan, R., Yang, H., Amaral, J. N., Zhang, C., and Gao, G. R. 2003. Minimum register instruction sequencing to reduce register spills in out-of-order issue super-scalar architectures. IEEE Trans. Comput. 52, 1, 4--20.
|
| |
17
|
Hennessy, J. L. and Patterson, D. A. 1996. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA.
|
| |
18
|
Kim, H. 2001. Region-based register allocation for EPIC architectures. Ph.D. thesis, Department of Computer Science, New York University.
|
| |
19
|
Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. ACM, New York, 330--335.
|
| |
20
|
Marquardt, A., Betz, V., and Rose, J. 2000. Timing-driven placement for FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays. ACM, New York, 203--213.
|
| |
21
|
Norris, C. and Pollock, L. L. 1993. A scheduler-sensitive global register allocator. In Proceedings of the ACM Super-Computing Conference. ACM, New York, 804--813.
|
| |
22
|
Pinter, S. S. 1993. Register allocation with instruction scheduling: A new approach. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 248--257.
|
| |
23
|
Texas Instruments, Inc. 2000. TMS320C6000 CPU and Instruction Set Reference Guide. Texas Instruments, Inc. http://focus.ti.com/lit/ug/spru189g/spru189g.pdf
|
| |
24
|
Touati, S.-A.-A. 2001. Register saturation in super-scalar and VLIW codes. In Proceedings of the International Conference on Compiler Construction. ACM, New York, 213--228.
|
| |
25
|
Touati, S.-A.-A. 2005. Register saturation in instruction level parallelism. Int. J. Parallel Program. 33, 4, 393--449.
|
| |
26
|
Transmeta, Inc. 2005. Transmeta Efficeon TM8820 Processor. Transmeta, Inc. http://datasheets.chipdb.org/Transmeta/pdfs/brochures/tmta_efficeon_tm8820.pdf
|
| |
27
|
Xu, W. and Tessier, R. 2007. Tetris: A new register pressure control technique for VLIW processors. In Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York.
|
| |
28
|
Zeitlhofer, T. and Wess, B. 2003. List-coloring of interval graphs with application to register assignment for heterogeneous register-set architectures. Signal Process. 83, 7, 1411--1425.
|
|