|
ABSTRACT
High-level loop optimizations are necessary to achieve good performance over a wide variety of processors. Their performance impact can be significant because they involve in-depth program transformations that aim to sustain a balanced workload over the computational, storage, and communication resources of the target architecture. Therefore, it is mandatory that the compiler accurately models the target architecture as well as the effects of complex code restructuring. However, because optimizing compilers (1) use simplistic performance models that abstract away many of the complexities of modern architectures, (2) rely on inaccurate dependence analysis, and (3) lack frameworks to express complex interactions of transformation sequences, they typically uncover only a fraction of the peak performance available on many applications. We propose a complete iterative framework to address these issues. We rely on the polyhedral model to construct and traverse a large and expressive search space. This space encompasses only legal, distinct versions resulting from the restructuring of any static control loop nest. We first propose a feedback-driven iterative heuristic tailored to the search space properties of the polyhedral model. Though, it quickly converges to good solutions for small kernels, larger benchmarks containing higher dimensional spaces are more challenging and our heuristic misses opportunities for significant performance improvement. Thus, we introduce the use of a genetic algorithm with specialized operators that leverage the polyhedral representation of program dependences. We provide experimental evidence that the genetic algorithm effectively traverses huge optimization spaces, achieving good performance improvements on large loop nests.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
F. Agakov , E. Bonilla , J. Cavazos , B. Franke , G. Fursin , M. F. P. O'Boyle , J. Thomson , M. Toussaint , C. K. I. Williams, Using Machine Learning to Focus Iterative Optimization, Proceedings of the International Symposium on Code Generation and Optimization, p.295-305, March 26-29, 2006
[doi> 10.1109/CGO.2006.37]
|
| |
2
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, Tiling imperfectly-nested loop nests, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.31-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
C. Bastoul and P. Feautrier. Improving data locality by chunking. In Intl. Conf. on Compiler Construction (ETAPS CC 12), volume 2622, pages 320--335, Warsaw, Poland, Apr. 2003.
|
| |
7
|
A. Bernstein. Analysis of programs for parallel processing. IEEE Trans. on Electronic Computers, 15(5):757--763, Oct. 1966.
|
| |
8
|
F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.
|
| |
9
|
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Intl. Conf. on Compiler Construction (ETAPS CC 17), Budapest, Hungary, Apr. 2008.
|
 |
10
|
Uday Bondhugula , Albert Hartono , J. Ramanujam , P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, June 07-13, 2008, Tucson, AZ, USA
|
 |
11
|
Keith D. Cooper , Alexander Grosul , Timothy J. Harvey , Steven Reeves , Devika Subramanian , Linda Torczon , Todd Waterman, ACME: adaptive compilation made efficient, Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 15-17, 2005, Chicago, Illinois, USA
|
 |
12
|
Keith D. Cooper , Philip J. Schielke , Devika Subramanian, Optimizing for reduced code space using genetic algorithms, Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems, p.1-9, May 05-05, 1999, Atlanta, Georgia, United States
|
| |
13
|
|
| |
14
|
|
| |
15
|
P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Sylvain Girbal , Nicolas Vasilache , Cédric Bastoul , Albert Cohen , David Parello , Marc Sigler , Olivier Temam, Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, International Journal of Parallel Programming, v.34 n.3, p.261-317, June 2006
[doi> 10.1007/s10766-006-0012-3]
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
Prasad A. Kulkarni , Stephen R. Hines , David B. Whalley , Jason D. Hiser , Jack W. Davidson , Douglas L. Jones, Fast and efficient searches for effective optimization-phase sequences, ACM Transactions on Architecture and Code Optimization (TACO), v.2 n.2, p.165-198, June 2005
[doi> 10.1145/1071604.1071607]
|
| |
25
|
M. Le Fur. Scanning parameterized polyhedron using Fourier-Motzkin elimination. Concurrency -- Practice and Experience, 8(6):445--460, 1996.
|
| |
26
|
C. Lee. UTDSP benchmark suite, 1998. http://www.eecg.toronto.edu/char‘ corinna/DSP.
|
 |
27
|
|
| |
28
|
S. Long and G. Fursin. Systematic search within an optimisation space based on unified transformation framework. IJCSE Intl. J. of Computational Science and Engineering, 2006.
|
 |
29
|
|
| |
30
|
|
| |
31
|
M. Palkovič. Enhanced Applicability of Loop Transformations. PhD thesis, T.U. Eindhoven, The Netherlands, Sept. 2007.
|
| |
32
|
S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC. In Proc. of the 4th GCC Developper's Summit, Ottawa, Canada, June 2006.
|
| |
33
|
L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. A note on the performance distribution of affine schedules. 2nd Workshop on Statistical and Machine learning approaches to ARchitectures and compilaTion (SMART'08), Göteborg, Sweden, Jan. 2008.
|
| |
34
|
|
 |
35
|
|
| |
36
|
Markus Püschel , José M. F. Moura , Bryan Singer , Jianxin Xiong , Jeremy Johnson , David Padua , Manuela Veloso , Robert W. Johnson, Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms, International Journal of High Performance Computing Applications, v.18 n.1, p.21-45, February 2004
[doi> 10.1177/1094342004041291]
|
| |
37
|
|
 |
38
|
|
| |
39
|
|
 |
40
|
|
| |
41
|
S. Triantafyllis, M. Vachharajani, and D. I. August. Compiler optimization-space exploration. In J. of Instruction-level Parallelism, volume 7, Jan. 2005.
|
| |
42
|
N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Proc. of the Intl. Conf. on Compiler Construction (ETAPS CC 16), volume 3923, pages 185--201, Vienna, Austria, Mar. 2006. Springer-Verlag.
|
| |
43
|
|
| |
44
|
|
| |
45
|
D. K. Wilde. A library for doing polyhedral operations. Technical Report 785, IRISA, Rennes, France, 1993.
|
| |
46
|
|
| |
47
|
|
CITED BY 3
|
|
|
|
|
Manman Ren , Ji Young Park , Mike Houston , Alex Aiken , William J. Dally, A tuning framework for software-managed memory hierarchies, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|