|
ABSTRACT
This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifies the various transforms as unimodular matrix transformations.The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive over-relaxation (SOR), LU decomposition without pivoting, and Givens QR factorization. Performance evaluation indicates that locality optimization is especially crucial for scaling up the performance of parallel code.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
P. Feautrier. Dataflow analysis of scalar and array references. Journal of Parallel and Distributed Computing, 20(1):23--53, February 1991.
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
U. Banerjee. Data dependence in ordinary programs. Technical Report 76--837, University of Illinios at Urbana-Champaign, Nov 1976.
|
| |
12
|
|
| |
13
|
U. Banerjee. Unimodular transformations of double loops. In 3rd Workshop on Languages and Compilers for Parallel Computing, Aug 1990.
|
 |
14
|
|
 |
15
|
|
| |
16
|
K. Gallivan, W. Jalby, U. Meier, and A. Sameh. The impact of hierachical memory systems on linear algebra algorithm design. Technical report, University of Illinios, 1987.
|
| |
17
|
|
| |
18
|
G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
|
| |
19
|
F. Irigoin and R. Triolet. Computing dependence direction vectors and dependence cones. Technical Report E94, Centre D'Automatique et Informatique, 1988.
|
 |
20
|
|
 |
21
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
 |
22
|
|
| |
23
|
|
| |
24
|
R. Schreiber and J. Dongarra. Automatic blocking of nested loops. 1990.
|
| |
25
|
|
| |
26
|
M. J. Wolfe. Techniques for improving the inherent parallelism in programs. Technical Report UIUCDCS-R-78-929, University of Illinois, 1978.
|
 |
27
|
|
|