|
ABSTRACT
Static compiler optimizations can hardly cope with the complex run-time behavior and hardware components interplay of modern processor architectures. Multiple architectural phenomena occur and interact simultaneously, which requires the optimizer to combine multiple program transformations. Whether these transformations are selected through static analysis and models, runtime feedback, or both, the underlying infrastructure must have the ability to perform long and complex compositions of program transformations in a flexible manner. Existing compilers are ill-equipped to perform that task because of rigid phase ordering, fragile selection rules using pattern matching, and cumbersome expression of loop transformations on syntax trees. Moreover, iterative optimization emerges as a pragmatic and general means to select an optimization strategy via machine learning and operations research. Searching for the composition of dozens of complex, dependent, parameterized transformations is a challenge for iterative approaches.The purpose of this article is threefold: (1) to facilitate the automatic search for compositions of program transformations, introducing a richer framework which improves on classical polyhedral representations, suitable for iterative optimization on a simpler, structured search space, (2) to illustrate, using several examples, that syntactic code representations close to the operational semantics hamper the composition of transformations, and (3) that complex compositions of transformations can be necessary to achieve significant performance benefits. The proposed framework relies on a unified polyhedral representation of loops and statements. The key is to clearly separate four types of actions associated with program transformations: iteration domain, schedule, data layout and memory access functions modifications. The framework is implemented within the Open64/ORC compiler, aiming for native IA64, AMD64 and IA32 code generation, along with source-to-source optimization of Fortran90, C and C++.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, Synthesizing transformations for locality enhancement of imperfectly-nested loop nests, Proceedings of the 14th international conference on Supercomputing, p.141-152, May 08-11, 2000, Santa Fe, New Mexico, United States
[doi> 10.1145/335231.335245]
|
 |
2
|
|
| |
3
|
|
| |
4
|
C. Bastoul and P. Feautrier. Improving data locality by chunking. In CC'12 Intl. Conference on Compiler Construction, LNCS 2622, pages 320--335, Warsaw, Poland, april 2003.
|
| |
5
|
C. Bastoul and P. Feautrier. More legal transformations for locality. In Euro-Par'10, number 3149 in LNCS, pages 272--283, Pisa, Aug. 2004.
|
 |
6
|
|
| |
7
|
William Blume , Ramon Doallo , Rudolf Eigenmann , John Grout , Jay Hoeflinger , Thomas Lawrence , Jaejin Lee , David Padua , Yunheung Paek , Bill Pottenger , Lawrence Rauchwerger , Peng Tu, Parallel Programming with Polaris, Computer, v.29 n.12, p.78-82, December 1996
[doi> 10.1109/2.546612]
|
| |
8
|
F. Chow. Maximizing application performance through interprocedural optimization with the pathscale eko compiler suite. http://www.pathscale.com/whitepapers.html, Aug. 2004.
|
| |
9
|
C. Coarfa, F. Zhao, N. Tallent, J. Mellor-Crummey, and Y. Dotsenko. Open-source compiler technology for source-to-source optimization. http://www.cs.rice.edu/~johnmc/research.html (project page).
|
| |
10
|
A. Cohen, S. Girbal, and O. Temam. A polyhedral approach to ease the composition of program transformations. In Euro-Par'04, number 3149 in LNCS, Pisa, Italy, Aug. 2004. Springer-Verlag.
|
| |
11
|
K. D. Cooper, M. W. Hall, R. T. Hood, K. Kennedy, K. S. McKinley, J. M. Mellor-Crummey, L. Torczon, and S. K. Warren. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244--263, 1993.
|
| |
12
|
|
| |
13
|
|
| |
14
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
 |
15
|
|
| |
16
|
KAP C/OpenMP for Tru64 UNIX and KAP DEC Fortran for Digital UNIX. http://www.hp.com/techsevers/software/kap.html.
|
| |
17
|
W. Kelly. Optimization within a unified transformation framework. Technical Report CS-TR-3725, University of Maryland, 1996.
|
| |
18
|
|
| |
19
|
T. Kisuki, P. Knijnenburg, K. Gallivan, and M. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling. In Parallel Architectures and Compilation Techniques (PACT'00). IEEE Computer Society Press, Oct. 2001.
|
| |
20
|
T. Kisuki, P. Knijnenburg, M. O'Boyle, and H. Wijshoff. Iterative compilation in program optimization. In Proc. CPC'10 (Compilers for Parallel Computers), pages 35--44, 2000.
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
M. O'Boyle, P. Knijnenburg, and G. Fursin. Feedback assisted iterative compiplation. In Proc. LCR, 2000.
|
| |
27
|
Open research compiler. http://ipf-orc.sourceforge.net.
|
| |
28
|
|
| |
29
|
|
| |
30
|
G.-R. Perrin and A. Darte, editors. The Data Parallel Programming model. Number 1132 in LNCS. Springer-Verlag, 1996.
|
| |
31
|
A. Phansalkar, A. Joshi, L. Eeckhout, and L. John. Four generations of SPEC CPU benchmarks: what has changed and what has not. Technical Report TR-041026-01-1, University of Texas Austin, 2004.
|
| |
32
|
|
| |
33
|
Standard performance evaluation corp. http://www.spec.org.
|
| |
34
|
|
| |
35
|
|
 |
36
|
Kamen Yotov , Xiaoming Li , Gang Ren , Michael Cibulskis , Gerald DeJong , Maria Garzaran , David Padua , Keshav Pingali , Paul Stodghill , Peng Wu, A comparison of empirical and model-driven optimization, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
CITED BY 10
|
|
|
|
|
Albert Cohen , Sébastien Donadio , Maria-Jesus Garzaran , Christoph Herrmann , Oleg Kiselyov , David Padua, In search of a program generator to implement generic transformations for high-performance computing, Science of Computer Programming, v.62 n.1, p.25-46, September 2006
|
|
|
Sylvain Girbal , Nicolas Vasilache , Cédric Bastoul , Albert Cohen , David Parello , Marc Sigler , Olivier Temam, Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, International Journal of Parallel Programming, v.34 n.3, p.261-317, June 2006
|
|
|
Nicolas Vasilache , Cedric Bastoul , Albert Cohen , Sylvain Girbal, Violated dependence analysis, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
Javed Absar , Min Li , Praveen Raghavan , Andy Lambrechts , Murali Jayapala , Arnout Vandecappelle , Francky Catthoor, Locality optimization in wireless applications, Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, September 30-October 03, 2007, Salzburg, Austria
|
|
|
Manman Ren , Ji Young Park , Mike Houston , Alex Aiken , William J. Dally, A tuning framework for software-managed memory hierarchies, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Hans Zima , Mary Hall , Chun Chen , Jaqueline Chame, Model-guided autotuning of high-productivity languages for petascale computing, Proceedings of the 18th ACM international symposium on High performance distributed computing, p.151-166, June 11-13, 2009, Garching, Germany
|
|