|
ABSTRACT
Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by developing cost models that characterize the performance of the tiled program as a function of tile sizes. All previous approaches to tile size selection (TSS) are cost model specific. Due to this they are neither extensible (e.g., to richer program classes/newer architectures) nor scalable (e.g., to multiple levels of tiling). This paper identifies positivity as a fundamental property shared by the functions and parameters commonly used in TSS models. We show how this positivity can be used as a basis to derive a TSS framework which is both efficient and scalable. We also show that almost all TSS models proposed in the literature (including those used in production compilers and auto-tuners) can be reduced to our framework.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
R. Schreiber and J. Dongarra, "Automatic blocking of nested loops," RIACS, NASA Ames Research Center, Tech. Rep. 90.38, Aug 1990.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
K. Yotov, X. Li, G. Ren, M. J. S. Garzaran, D. Padua, K. Pingali, and P. Stodghill, "Is search really necessary to generate high-performance BLAS?" Proceedings of the IEEE, vol. 93, pp. 358--386, 2005.
|
| |
7
|
L. Renganarayanan, M. Harthi-kote, R. Dewri, and S. Rajopadhye, "Towards optimal multi-level tiling for stencil computations," in 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS) (to appear), 2007.
|
 |
8
|
Muthu Manikandan Baskaran , Uday Bondhugula , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , P. Sadayappan, Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345210]
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
Hiroshi Ohta , Yasuhiko Saito , Masahiro Kainaga , Hiroyuki Ono, Optimal tile size adjustment in compiling general DOACROSS loop nests, Proceedings of the 9th international conference on Supercomputing, p.270-279, July 03-07, 1995, Barcelona, Spain
[doi> 10.1145/224538.224571]
|
| |
14
|
J. Ramanujam and P. Sadayappan, "Tiling multidimensional itertion spaces for multicomputers." J. Parallel Distrib. Comput., vol. 16, no. 2, pp. 108--120, 1992.
|
| |
15
|
R. Andonov, S. Balev, S. V. Rajopadhye, and N. Yanev, "Optimal semi-oblique tiling," IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 9, pp. 944--960, 2003.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
C. Hsu and U. Kremer, "Tile selection algorithms and their performance models," CS Dept, Rutgers University, Tech. Rep. DCS-TR-401, Oct. 1999.
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
B. B. Fraguela, M. G. Carmueja, and D. Andrade, "Optimal tile size selection guided by analytical models." in PARCO, 2005, pp. 565--572.
|
| |
24
|
|
 |
25
|
|
 |
26
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
 |
30
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
[doi> 10.1145/263580.263662]
|
| |
31
|
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
|
| |
38
|
R. Duffin, E. Peterson, and C. Zener, Geometric Programming - Theory and Applications. John Wiley, 1967.
|
| |
39
|
S. Boyd, S. J. Kim, L. Vandenberghe, and A. Hassibi, "A tutorial on Geometric Programming," To appear in Optimization and Engineering, 2006.
|
| |
40
|
|
| |
41
|
K. Esseghir, "Improving data locality for caches," Master's thesis, Rice University, September 1993.
|
| |
42
|
S. Moon and R. Saavedra, "Hyperblocking: A data reorganization method to eliminate cache conflicts in tiled loop nests," University of Southern California, Tech. Rep. TR-98-671, February 1998.
|
| |
43
|
|
| |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
L. Renganarayana, R. Upadrasta, and S. Rajopadhye, "Optimal ILP and register tiling: Analytical model and optimization framework," in LCPC 2005: 12th International Workshop on Languages and Compilers for Parallel Computing. Springer Verlag, 2005.
|
| |
48
|
|
 |
49
|
|
| |
50
|
|
 |
51
|
|
| |
52
|
J. Löfberg, "YALMIP: A toolbox for modeling and optimization in MATLAB," in Proceedings of the CACSD Conference, Taipei, Taiwan, 2004, available from http://control.ee.ethz.ch/~joloef/yalmip.php.
|
| |
53
|
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, "A practical and fully automatic polyhedral program optimization system," in ACM SIGPLAN PLDI, Jun. 2008.
|
 |
54
|
|
 |
55
|
|
 |
56
|
Siddhartha Chatterjee , Erin Parker , Philip J. Hanlon , Alvin R. Lebeck, Exact analysis of the cache behavior of nested loops, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.286-297, June 2001, Snowbird, Utah, United States
|
| |
57
|
B. Meister and S. Verdoolaege, "Polynomial approximations in the polytope model: Bringing the power of quasi-polynomials to the masses," in ODES-6: 6th Workshop on Optimizations for DSP and Embedded Systems, Apr 2008, pp. 45--54.
|
|