|
ABSTRACT
Because processor architectures are increasingly complex, it is increasingly difficult to embed accurate machine models within compilers. As a result, compiler efficiency tends to decrease. Currently, the trend is on top-down approaches: static compilers are progressively augmented with information from the architecture as in profile-based, iterative or dynamic compilation techniques. However, for the moment, fairly elementary architectural information is used. In this article, we adopt a bottom-up approach to the architecture complexity issue: we assume we know everything about the behavior of the program on the architecture. We present a manual but systematic process for optimizing a program on a complex processor architecture using extensive dynamic analysis, and we find that a small set of run-time information is sufficient to drive anefficient process. We have experimentally observed on an Alpha 21264 that this approach can yield significant performance improvement on Spec benchmarks, beyond peak Spec. We are currently using this approach for optimizing customer applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam. Putting polyhedral loop transformation to work. In 10th International Workshop on Languages and Compilers for Parallel Computing (LCPC), October 2003.
|
| |
2
|
|
| |
3
|
Jeffrey Dean , James E. Hicks , Carl A. Waldspurger , William E. Weihl , George Chrysos, ProfileMe: hardware support for instruction-level profiling on out-of-order processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.292-302, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
4
|
[4] G. Fursin, M. O'Boyle, and P. Knijnenburg. Evaluating iterative compilation. In 11th Workshop on Languages and Compilers for Parallel Computing, LNCS, Washington DC, July 2002. Springer-Verlag.
|
 |
5
|
Sylvain Girbal , Gilles Mouchard , Albert Cohen , Olivier Temam, DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time, Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 11-14, 2003, San Diego, CA, USA
|
| |
6
|
[6] Intel Itanium2 processor reference manual for software development and optimization. http: //developer.intel.com/design/itatium2/manuals.
|
 |
7
|
|
| |
8
|
[8] T. Kisuki, P. Knijnenburg, M. O'Boyle, and H. Wijshoff. Iterative compilation in program optimization. In Proc. CPC'10 (Compilers for Parallel Computers), pages 35-44, 2000.
|
 |
9
|
|
| |
10
|
[10] Oprofile project. http://oprofile.sourceforge.net.
|
| |
11
|
[11] Open research compiler. http://ipf-orc.sourceforge.net.
|
| |
12
|
|
 |
13
|
Erez Perelman , Greg Hamerly , Michael Van Biesbrouck , Timothy Sherwood , Brad Calder, Using SimPoint for accurate and efficient simulation, Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 11-14, 2003, San Diego, CA, USA
|
| |
14
|
[14] Perfmon project. http://www.hpl.hp.com/research/linux/perfmon.
|
| |
15
|
[15] Standard performance evaluation corporation. http://www.spec.org.
|
 |
16
|
Mark Stephenson , Saman Amarasinghe , Martin Martin , Una-May O'Reilly, Meta optimization: improving compiler heuristics with machine learning, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
| |
17
|
[17] Intel VTune performance analysers. http://www.intel.com/software/products/vtune.
|
 |
18
|
|
 |
19
|
Kamen Yotov , Xiaoming Li , Gang Ren , Michael Cibulskis , Gerald DeJong , Maria Garzaran , David Padua , Keshav Pingali , Paul Stodghill , Peng Wu, A comparison of empirical and model-driven optimization, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
CITED BY 5
|
|
Albert Cohen , Marc Sigler , Sylvain Girbal , Olivier Temam , David Parello , Nicolas Vasilache, Facilitating the search for compositions of program transformations, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
John Cavazos , Christophe Dubach , Felix Agakov , Edwin Bonilla , Michael F. P. O'Boyle , Grigori Fursin , Olivier Temam, Automatic performance model construction for the fast software exploration of new hardware designs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
Albert Cohen , Sébastien Donadio , Maria-Jesus Garzaran , Christoph Herrmann , Oleg Kiselyov , David Padua, In search of a program generator to implement generic transformations for high-performance computing, Science of Computer Programming, v.62 n.1, p.25-46, September 2006
|
|
|
Sylvain Girbal , Nicolas Vasilache , Cédric Bastoul , Albert Cohen , David Parello , Marc Sigler , Olivier Temam, Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, International Journal of Parallel Programming, v.34 n.3, p.261-317, June 2006
|
|
|
John Cavazos , Grigori Fursin , Felix Agakov , Edwin Bonilla , Michael F. P. O'Boyle , Olivier Temam, Rapidly Selecting Good Compiler Optimizations using Performance Counters, Proceedings of the International Symposium on Code Generation and Optimization, p.185-197, March 11-14, 2007
|
|