|
ABSTRACT
Scientific programs often include multiple loops over the same data; interleaving parts of different loops may greatly improve performance. We exploit this in a compiler for Titanium, a dialect of Java. Our compiler combines reordering optimizations such as loop fusion and tiling with storage optimizations such as array contraction (eliminating or reducing the size of temporary arrays).The programmers we have in mind are willing to spend some time tuning their code and their compiler parameters. Given that, and the difficulty in statically selecting parameters such as tile sizes, it makes sense to provide automatic parameter searching alongside the compiler. Our strategy is to optimize aggressively but to expose the compiler's decisions to external control. We double or triple the performance of Gauss-Seidel relaxation and multi-grid (versus an optimizing compiler without tiling and array contraction), and we argue that ours is the best compiler for that kind of program.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
[doi> 10.1145/263580.263662]
|
| |
2
|
W. L. Briggs. A Multigrid Tutorial. SIAM, 1987.
|
 |
3
|
Doug Burger , James R. Goodman , Alain Kägi, Memory bandwidth limitations of future microprocessors, Proceedings of the 23rd annual international symposium on Computer architecture, p.78-89, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
4
|
Larry Carter, Jeanne Ferrante, Susan Flynn Hummel, Bowen Alpern, Kang-Su Gatlin. Hierarchical Tiling: A Methodology for High Performance. UCSD Technical Report CS96-508, November 1996.
|
| |
5
|
|
| |
6
|
C. C. Douglas et al. Maximizing Cache Memory Usage for Multigrid Algorithms. In Z. Chen, R. E. Ewing and Z.-C. Shi, editors, Multiphase Flows and Transport in Porous Media: State of the Art, Springer-Verlag, Lecture Notes in Physics, Berlin, 2000.
|
| |
7
|
FFTW. http://www.fftw.org/.
|
| |
8
|
Paul N. Hilfinger , Dan Bonachea , David Gay , Susan Graham , Ben Liblit , Geoff Pike , Katherine Yelick, Titanium Language Reference Manual, University of California at Berkeley, Berkeley, CA, 2001
|
| |
9
|
S. Flynn Hummel, I. Banicescu, C. Wang, and J. Wein. Load Balancing and Data Locality via Fractiling: An Experimental Study. In Boleslaw K. Szymanski and Balaram Sinharoy, editors, Proc. Third Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 85--89. Kluwer Academic Publishers, Boston, MA, 1995.
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
Wayne Kelly , Vadim Maslov , William Pugh , Evan Rosser , Tatiana Shpeisman , David Wonnacott, The Omega Library interface guide, University of Maryland at College Park, College Park, MD, 1995
|
| |
14
|
T. Kisuki, P. M. W. Knijnenburg, K. Gallivan, and M. F. P. O'Boyle. The Effect of Cache Models on Iterative Compilation for Combined Tiling and Unrolling. In Proc. FDDO-3, pages 31--40, 2000.
|
| |
15
|
T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation. Technical Report 2000-07, LIACS, Leiden University, 2000.
|
| |
16
|
P. M. W. Knijnenburg , T. Kisuki , M. F. P. O'Boyle, Iterative compilation, Embedded processor design challenges: systems, architectures, modeling, and simulation-SAMOS, Springer-Verlag New York, Inc., New York, NY, 2002
|
 |
17
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
 |
18
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
19
|
|
 |
20
|
|
| |
21
|
M. F. P. O'Boyle, P. M. W. Knijnenburg, and G. G. Fursin. Feedback Assisted Iterative Compilation. Preprint, 2000.
|
| |
22
|
|
| |
23
|
G. Pike, L. Semenzato, P. Colella, P. Hilfinger. Parallel 3D Adaptive Mesh Refinement in Titanium. In Proceedings of the SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, TX, March 1999.
|
| |
24
|
|
 |
25
|
|
 |
26
|
Michelle Mills Strout , Larry Carter , Jeanne Ferrante , Beth Simon, Schedule-independent storage mapping for loops, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.24-33, October 02-07, 1998, San Jose, California, United States
|
 |
27
|
William Thies , Frédéric Vivien , Jeffrey Sheldon , Saman Amarasinghe, A unified framework for schedule and storage optimization, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.232-242, June 2001, Snowbird, Utah, United States
|
| |
28
|
|
| |
29
|
|
 |
30
|
|
CITED BY 7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
K. Yelick , P. Hilfinger , S. Graham , D. Bonachea , J. Su , A. Kamil , K. Datta , P. Colella , T. Wen, Parallel Languages and Compilers: Perspective From the Titanium Experience, International Journal of High Performance Computing Applications, v.21 n.3, p.266-290, August 2007
|
|
|
|
|