|
ABSTRACT
Loop fusion and tiling are both recognized as effective transformations for improving memory performance of scientific applications. However, because of their sensitivity to the underlying cache architecture and their interaction with each other it is difficult to determine a good heuristic for applying these transformations profitably across architectures. In this paper, we present a model-guided empirical tuning strategy for profitable application of loop fusion and tiling. Our strategy consists of a detailed cost model that characterizes the interaction between the two transformations at different levels of the memory hierarchy. The novelty of our approach is in exposing key architectural parameters within the model for automatic tuning through empirical search. Preliminary experiments with a set of applications on four different platforms show that our strategy achieves significant performance improvement over fully optimized code generated by state-of-the-art commercial compilers. The time spent in searching for the best parameters is considerably less than with other search strategies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
[doi> 10.1145/263580.263662]
|
| |
2
|
|
 |
3
|
|
| |
4
|
K. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. In Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NM, Oct. 2001.
|
| |
5
|
C. Ding and K. Kennedy. Resource-constrained loop fusion. Technical report, Dept. of Computer Science, Rice University, Oct. 2000.
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
G. G. Fursin, M. F. P. O'Boyle, and P. M. W. Knijnenburg. Evaluating iterative compilation. In Proceedings of the Fifteenth International Workshop on Languages and Compilers for Parallel Computing, College Park, Maryland, July 2002.
|
 |
10
|
|
| |
11
|
|
 |
12
|
Prasad Kulkarni , Stephen Hines , Jason Hiser , David Whalley , Jack Davidson , Douglas Jones, Fast searches for effective optimization phase sequences, Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, June 09-11, 2004, Washington DC, USA
|
| |
13
|
A. Lim and M. Lam. Cache optimizations with affine partitioning. In Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, Virginia, Mar. 2001.
|
 |
14
|
|
| |
15
|
|
| |
16
|
A. Qasem and K. Kennedy. A cache-conscious profitability model for empirical tuning of loop fusion. In Proceedings of the Eighteenth International Workshop on Languages and Compilers for Parallel Computing, Hawthorne, NY, Oct. 2005.
|
| |
17
|
A. Qasem, K. Kennedy, and J. Mellor-Crummey. Automatic tuning of whole applications using direct search and a performance-based transformation system. In Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NM, Oct. 2004.
|
 |
18
|
|
 |
19
|
O. Temam , C. Fricker , W. Jalby, Cache interference phenomena, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.261-271, May 16-20, 1994, Nashville, Tennessee, United States
|
| |
20
|
|
| |
21
|
S. Verdoolaege, M. Bruynooghe, G. Jenssens, and F. Catthoor. Multi-dimensional incremental loop fusion for data locality. In Proceedings of the IEEE International Conference on Application Specific Systems, Architectures, and Processors, June 2003.
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
Jianxin Xiong , Jeremy Johnson , Robert Johnson , David Padua, SPL: a language and compiler for DSP algorithms, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.298-308, June 2001, Snowbird, Utah, United States
|
 |
26
|
Kamen Yotov , Xiaoming Li , Gang Ren , Michael Cibulskis , Gerald DeJong , Maria Garzaran , David Padua , Keshav Pingali , Paul Stodghill , Peng Wu, A comparison of empirical and model-driven optimization, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
| |
27
|
H. You, K. Seymour, and J. Dongarra. An effective empirical search method for automatic software tuning. Technical report, University of Tennessee, Feb. 2005.
|
| |
28
|
Y. Zhao, Q. Yi, K. Kennedy, D. Quinlan, and R. Vuduc. Parameterizing loop fusion for automated empirical tuning. Technical report, Lawrence Livermore National Laboratory, Dec. 2005.
|
CITED BY 2
|
|
|
|
|
Manman Ren , Ji Young Park , Mike Houston , Alex Aiken , William J. Dally, A tuning framework for software-managed memory hierarchies, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|