|
ABSTRACT
Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, which are most often used as temporary repositories for subscripted variables.In this paper, we present a source-to-source transformation, called scalar replacement, that finds opportunities for reuse of subscripted variables and replaces the references involved by references to temporary scalar variables. The objective is to increase the likelihood that these elements will be assigned to registers by the coloring-based register allocators found in most compilers. In addition, we present transformations to improve the overall effectiveness of scalar replacement and show how these transformations can be applied in a variety of loop nest types. Finally, we present experimental results showing that these techniques are extremely effective---capable of achieving integer factor speedups over code generated by good optimizing compilers of conventional design.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
P. Briggs , K. D. Cooper , K. Kennedy , L. Torczon, Coloring heuristics for register allocation, Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, p.275-284, June 19-23, 1989, Portland, Oregon, United States
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
G. Chaitin, M. Auslander, A. Chandra, J. Cocke, M. Hopkins, and P. Markstein. Register allocation via coloring. Computer Languages, 6:45--57, Jan. 1981.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Evelyn Duesterwald , Rajiv Gupta , Mary Lou Soffa, A practical data flow framework for array reference analysis and its use in optimizations, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, p.68-77, June 21-25, 1993, Albuquerque, New Mexico, United States
|
| |
17
|
D. Kuck, R. Kuhn, B. Leasure, and M. Wolfe. The structure of an advanced retargetable vectorizer. In Supercomputers: Design and Applications, pages 163--178. IEEE Computer Society Press, Silver Spring, MD., 1984.
|
 |
18
|
D. J. Kuck , R. H. Kuhn , D. A. Padua , B. Leasure , M. Wolfe, Dependence graphs and compiler optimizations, Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.207-218, January 26-28, 1981, Williamsburg, Virginia
[doi> 10.1145/567532.567555]
|
| |
19
|
D. Kuck, Y. Muraoka, and S. Chen. On the number of operations simultaneously executable in fortran-like programs and their resulting speedup. IEEE Transactions on Computers, C-21(12):1293--1310, Dec. 1972.
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
M. Wolfe. Advanced loop interchange. In Proceedings of the 1986 International Conference on Parallel Processing, Aug. 1986.
|
| |
28
|
|
| |
29
|
{AC72} F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.
|
 |
30
|
|
| |
31
|
{AK84b} J. R. Allen and K. Kennedy. PFC: A program to convert fortran to parallel form. In Supercomputers: Design and Applications, pages 186--205. IEEE Computer Society Press, Silver Spring, MD., 1984.
|
| |
32
|
{AK88} J. R. Allen and K. Kennedy. Vector register allocation. Technical report, Department of Computer Science, Rice University, 1988.
|
| |
33
|
|
| |
34
|
{CAC+81} G. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. W. Markstein. Register allocation via coloring. Computer Languages, 6:45--57, January 1981.
|
| |
35
|
{CCK87} D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. In Proceedings of the 1987 International Conference on Parallel Processing, August 1987.
|
| |
36
|
|
| |
37
|
{DBMS79} J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. LINPACK User's Guide. SIAM Publications, Philadelphia, 1979.
|
| |
38
|
|
 |
39
|
|
| |
40
|
|
| |
41
|
|
| |
42
|
|
| |
43
|
{Wol86} M. Wolfe. Advanced loop interchange. In Proceedings of the 1986 International Conference on Parallel Processing, August 1986.
|
| |
44
|
|
 |
45
|
|
CITED BY
|
|
Shane Ryoo , Christopher I. Rodrigues , Sara S. Baghsorkhi , Sam S. Stone , David B. Kirk , Wen-mei W. Hwu, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|