|
ABSTRACT
This article investigates register allocation for software pipelined multidimensional loops where the execution of successive iterations from an n-dimensional loop is overlapped. For single loop software pipelining, the lifetimes of a loop variable in successive iterations of the loop form a repetitive pattern. An effective register allocation method is to represent the pattern as a vector of lifetimes (or a vector lifetime using Rau's terminology [Rau 1992]) and map it to rotating registers. Unfortunately, the software pipelined schedule of a multidimensional loop is considerably more complex and so are the vector lifetimes in it. In this article, we develop a way to normalize and represent the vector lifetimes, which captures their complexity, while exposing their regularity that enables a simple solution. The problem is formulated as bin-packing of the multidimensional vector lifetimes on the surface of a space-time cylinder. A metric, called distance, is calculated either conservatively or aggressively to guide the bin-packing process, so that there is no overlapping between any two vector lifetimes, and the register requirement is minimized. This approach subsumes the classical register allocation for software pipelined single loops as a special case. The method has been implemented in the ORC compiler and produced code for the IA-64 architecture. Experimental results show the effectiveness. Several strategies for register allocation are compared and analyzed.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
J. R. Allen , Ken Kennedy , Carrie Porterfield , Joe Warren, Conversion of control dependence to data dependence, Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, p.177-189, January 24-26, 1983, Austin, Texas
[doi> 10.1145/567067.567085]
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Douillet, A. and Gao, G. R. 2005. Register pressure in software-pipelined loop nests: Fast computation and impact on architecture design. In The 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC'05). Hawthorne, NY, 17--31.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Gao, G. R., Ning, Q., and Dongen, V. V. 1993. Software pipelining for nested loops. ACAPS Tech Memo 53, School of Computer Science, McGill Univ., Montréal, Québec.
|
| |
16
|
|
 |
17
|
|
| |
18
|
Intel. 2001. Intel IA-64 Architecture Software Developer's Manual. Vol. 1: IA-64 Application Architecture. Intel Corporation, Santa Clara, CA.
|
 |
19
|
|
 |
20
|
|
| |
21
|
Lawler, E. L., Lenstra, J. K., Khan, A. H. G. R., and Shmoys, D. B. 1985. The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. John Wiley & Sons.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
Rau, B. R., Lee, M., Tirumalai, P. P., and Schlansker, M. S. 1992. Register allocation for modulo scheduled loops: Strategies, algorithms and heuristics. HP Labs Tech. rep. HPL-92-48, Hewlett-Packard Laboratories, Palo Alto, CA.
|
| |
28
|
Hongbo Rong , Alban Douillet , R. Govindarajan , Guang R. Gao, Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.175, March 20-24, 2004, Palo Alto, California
|
| |
29
|
Rong, H. and Govindarajan, R. 2007. Advances in software piplining. In The Compiler Design Handbook: Optimization and Machine Code Generation, 2nd Ed. Y. N. Srikant and P. Shankar, Eds. CRC, Chapter 20.
|
| |
30
|
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., and Gao, G. R. 2007a. Single-dimension software pipelining for multi-dimensional loops. CAPSL Technical Memo, Department of Electrical and Computer Engineering, University of Delaware, Newark, DE. In ftp://ftp.capsl.udel.edu/pub/doc/memos/memo049.ps.gz.
|
 |
31
|
|
| |
32
|
Turkington, K., Masselos, K., Constantinides, G. A., and Leong, P. 2006. FPGA based acceleration of the linpack benchmark: A high level code transformation approach. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). Madrid, Spain. IEEE, 1--6.
|
| |
33
|
|
|