ACM Home Page
Please provide us with feedback. Feedback
Automatic tiling of iterative stencil loops
Full text PdfPdf (948 KB)
Source ACM Transactions on Programming Languages and Systems (TOPLAS) archive
Volume 26 ,  Issue 6  (November 2004) table of contents
Pages: 975 - 1028  
Year of Publication: 2004
ISSN:0164-0925
Authors
Zhiyuan Li  Purdue University, West Lafayette, IN
Yonghong Song  Purdue University, West Lafayette, IN
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 79,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1034774.1034777
What is a DOI?

ABSTRACT

Iterative stencil loops are used in scientific programs to implement relaxation methods for numerical simulation and signal processing. Such loops iteratively modify the same array elements over different time steps, which presents opportunities for the compiler to improve the temporal data locality through loop tiling. This article presents a compiler framework for automatic tiling of iterative stencil loops, with the objective of improving the cache performance. The article first presents a technique which allows loop tiling to satisfy data dependences in spite of the difficulty created by imperfectly nested inner loops. It does so by skewing the inner loops over the time steps and by applying a uniform skew factor to all loops at the same nesting level. Based on a memory cost analysis, the article shows that the skew factor must be minimized at every loop level in order to minimize cache misses. A graph-theoretical algorithm, which takes polynomial time, is presented to determine the minimum skew factor. Furthermore, the memory-cost analysis derives the tile size which minimizes capacity misses. Given the tile size, an efficient and general <i>array-padding</i> scheme is applied to remove conflict misses. Experiments were conducted on 16 test programs and preliminary results showed an average speedup of 1.58 and a maximum speedup of 5.06 across those test programs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Admas, J. C. 1999. MUDPACK: Multigrid Software for Elliptic Partial Differential Equations. Available on line at http://www.scd.ucar.edu/css/software/mudpack/.
2
 
3
4
5
 
6
Andersen, B. S., Gustavson, F. G., Wasniewski, J., and Yalamov, P. Y. 1999. Recursive formulation of some dense linear algebra algorithms. In Proceedings of the SIAM Conference on Parallel Processing for Scientific Computing (San Antonio, TX).
7
 
8
 
9
 
10
Boulet, P., Dongarra, J., Robert, Y., and Vivien, F. 1999. Static tiling for heterogeneous computing platforms. Parall. Comput. 25, 547--568.
11
12
13
14
15
16
17
 
18
Collard, J.-F. 1994. Space-time transformation of while-loops using speculative execution. In Proceedings of the Scalable High Performance Computing Conference (Knoxville, TN). 429--436.
 
19
 
20
 
21
 
22
23
24
 
25
Haghighat, M. R. 1990. Symbolic dependence analysis for high performance parallelizing compilers. Ph.D. dissertation. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL.
 
26
27
 
28
29
 
30
31
 
32
33
 
34
 
35
Matula, D. and Beck, L. 1981. Smallest-last ordering and clustering and graph coloring algorithms. Tech. rep. TR CSE 8104. Department of Computer Science and Engineering, Southern Methodist University, Dallas, TX.
 
36
 
37
 
38
Object-Oriented Scientific Computing. 2001. Blitz++. Object-Oriented Scientific Computing, Available online at http://www.oonumerics.org/blitz/benchmarks/.
39
 
40
 
41
42
 
43
 
44
 
45
 
46
 
47
 
48
49
50
51
52
53
 
54
 
55
 
56
 
57

CITED BY  7

Collaborative Colleagues:
Zhiyuan Li: colleagues
Yonghong Song: colleagues