|
ABSTRACT
A key step in program optimization is the determination of optimal values for code optimization parameters such as cache tile sizes and loop unrolling factors. One approach, which is implemented in most compilers, is to use analytical models to determine these values. The other approach, used in library generators like ATLAS, is to perform a global empirical search over the space of parameter values.Neither approach is completely suitable for use in general-purpose compilers that must generate high quality code for large programs running on complex architectures. Model-driven optimization may incur a performance penalty of 10-20% even for a relatively simple code like matrix multiplication. On the other hand, global search is not tractable for optimizing large programs for complex architectures because the optimization space is too large.In this paper, we advocate a methodology for generating high-performance code without increasing search time dramatically. Our methodology has three components: (i) modeling, (ii) local search, and (iii) model refinement. We demonstrate this methodology by using it to eliminate the performance gap between code produced by a model-driven version of ATLAS described by us in prior work, and code produced by the original ATLAS system using global search.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Automatically Tuned Linear Algebra Software (ATLAS). http://math-atlas.sourceforge.net/.
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
Paolo D'Alberto and Alex Nicolau. Juliusc: A practical approach for the analysis of divide-and-conquer algorithms. In LCPC, 2004.
|
| |
6
|
Jack Dongarra. Personal communication.
|
| |
7
|
|
| |
8
|
Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2), 2005. special issue on "Program Generation, Optimization, and Adaptation".
|
| |
9
|
|
| |
10
|
|
| |
11
|
William Press, Saul Teukolsky, William Vetterling, and Brian Flannery. Numerical Recipes in C. Cambridge University Press, 2002.
|
| |
12
|
Markus Püschel, José M. F. Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Gaĉić, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2), 2005. special issue on "Program Generation, Optimization, and Adaptation".
|
| |
13
|
|
| |
14
|
|
| |
15
|
R. Clint Whaley. http://sourceforge.net/mailarchive/forum.php? thread_id=1569256&forum_id%=426.
|
| |
16
|
R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.
|
 |
17
|
Kamen Yotov , Xiaoming Li , Gang Ren , Michael Cibulskis , Gerald DeJong , Maria Garzaran , David Padua , Keshav Pingali , Paul Stodghill , Peng Wu, A comparison of empirical and model-driven optimization, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
| |
18
|
Kamen Yotov, Xiaoming Li, Gang Ren, Maria Garzaran, David Padua, Keshav Pingali, and Paul Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2), 2005. special issue on "Program Generation, Optimization, and Adaptation".
|
 |
19
|
|
CITED BY 5
|
|
John Cavazos , Christophe Dubach , Felix Agakov , Edwin Bonilla , Michael F. P. O'Boyle , Grigori Fursin , Olivier Temam, Automatic performance model construction for the fast software exploration of new hardware designs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
John Cavazos , Grigori Fursin , Felix Agakov , Edwin Bonilla , Michael F. P. O'Boyle , Olivier Temam, Rapidly Selecting Good Compiler Optimizations using Performance Counters, Proceedings of the International Symposium on Code Generation and Optimization, p.185-197, March 11-14, 2007
|
|
|
|
|