|
ABSTRACT
We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. The input native instruction stream to Dynamo can be dynamically generated (by a JIT for example), or it can come from the execution of a statically compiled native binary. This paper evaluates the Dynamo system in the latter, more challenging situation, in order to emphasize the limits, rather than the potential, of the system. Our experiments demonstrate that even statically optimized native binaries can be accelerated Dynamo, and often by a significant degree. For example, the average performance of -O optimized SpecInt95 benchmark binaries created by the HP product C compiler is improved to a level comparable to their -O4 optimized version running without Dynamo. Dynamo achieves this by focusing its efforts on optimization opportunities that tend to manifest only at runtime, and hence opportunities that might be difficult for a static compiler to exploit. Dynamo's operation is transparent in the sense that it does not depend on any user annotations or binary instrumentation, and does not require multiple runs, or any special compiler, operating system or hardware support. The Dynamo prototype presented here is a realistic implementation running on an HP PA-8000 workstation under the HPUX 10.20 operating system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Joel Auslander , Matthai Philipose , Craig Chambers , Susan J. Eggers , Brian N. Bershad, Fast, effective dynamic compilation, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.149-159, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
2
|
Bala, V., Duesterwald, E., and Banerjia, S. 1999. Transparent dynamic optimization: The design and implementation of Dynamo. Hewlett Packard Laboratories Technical Report HPL-1999-78. June 1999.
|
| |
3
|
Bala V., and Freudenberger, S. 1996. Dynamic optimization: the Dynamo project at HP Labs Cambridge (project proposal). HP Labs internal memo, Feb 1996.
|
| |
4
|
|
 |
5
|
|
 |
6
|
C. Chambers , D. Ungar, Customization: optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language, Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, p.146-160, June 19-23, 1989, Portland, Oregon, United States
|
| |
7
|
Anton Chernoff , Mark Herdeg , Ray Hookway , Chris Reeve , Norman Rubin , Tony Tye , S. Bharadwaj Yadavalli , John Yates, FX!32: A Profile-Directed Binary Translator, IEEE Micro, v.18 n.2, p.56-64, March 1998
[doi> 10.1109/40.671403]
|
| |
8
|
|
 |
9
|
|
| |
10
|
Timothy Cramer , Richard Friedman , Terrence Miller , David Seberger , Robert Wilson , Mario Wolczko, Compiling Java Just in Time, IEEE Micro, v.17 n.3, p.36-43, May 1997
[doi> 10.1109/40.591653]
|
 |
11
|
|
 |
12
|
|
 |
13
|
Dawson R. Engler, VCODE: a retargetable, extensible, very fast dynamic code generation system, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.160-170, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
14
|
|
| |
15
|
|
 |
16
|
Brian Grant , Matthai Philipose , Markus Mock , Craig Chambers , Susan J. Eggers, An evaluation of staged run-time optimizations in DyC, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.293-304, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
17
|
Herold, S.A. 1998. Using complete machine simulation to understand computer system behavior. Ph.D. thesis, Dept. Computer Science, Stanford University.
|
| |
18
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
19
|
Keller, J. 1996. The 21264: a superscalar Alpha processor with out-of-order execution. Presented at the 9th Annual Microprocessor Forum, San Jose, CA.
|
| |
20
|
Kelly, E.K., Cmelik, R.F., and Wing, M.J. 1998. Memory controller for a microprocessor for detecting a failure of speculation on the physical nature of a component being addressed. U.S. Patent 5,832,205, Nov. 1998.
|
| |
21
|
Kumar, A. 1996. The HP PA-8000 RISC CPU: a high performance out-of-order processor. In Proceedings of Hot Chips VIII, Palo Alto, CA.
|
| |
22
|
Leone, M. and Dybvig, R.K. 1997. Dynamo: a staged compiler architecture for dynamic program optimization. Technical Report #490, Dept. of Computer Science, Indiana University.
|
 |
23
|
|
 |
24
|
Renaud Marlet , Charles Consel , Philippe Boinot, Efficient incremental run-time specialization for free, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.281-292, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
25
|
|
 |
26
|
Massimiliano Poletto , Dawson R. Engler , M. Frans Kaashoek, tcc: a system for fast, flexible, and high-level dynamic code generation, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.109-121, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
27
|
|
| |
28
|
|
| |
29
|
Sites, R.L., Chernoff, A., Kirk, M.B., Marks, M.P., and Robinson, S.G. Binary Translation. Digital Technical Journal, Vol 4, No. 4, Special Issue, 1992.
|
| |
30
|
Stears, P. 1994. Emulating the x86 and DOS/Windows in RISC environments. In Proceedings of the Microprocessor Forum, San Jose, CA.
|
 |
31
|
|
CITED BY 189
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jaydeep Marathe , Frank Mueller , Tushar Mohan , Bronis R. de Supinski , Sally A. McKee , Andy Yoo, METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
K. Scott , N. Kumar , S. Velusamy , B. Childers , J. W. Davidson , M. L. Soffa, Retargetable and reconfigurable software dynamic translation, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Howard Chen , Wei-Chung Hsu , Jiwei Lu , Pen-Chung Yew , Dong-Yuan Chen, Dynamic trace selection using performance monitoring hardware sampling, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
James C. Dehnert , Brian K. Grant , John P. Banning , Richard Johnson , Thomas Kistler , Alexander Klaiber , Jim Mattson, The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gaurav Mittal , David C. Zaretsky , Xiaoyong Tang , P. Banerjee, Automatic translation of software binaries onto FPGAs, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Naveen Kumar , Bruce R. Childers , Mary Lou Soffa, Tdb: a source-level debugger for dynamically translated programs, Proceedings of the sixth international symposium on Automated analysis-driven debugging, p.123-132, September 19-21, 2005, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sitij Agrawal , William Thies , Saman Amarasinghe, Optimizing stream programs using linear state space analysis, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Naveen Kumar , Jonathan Misurda , Bruce R. Childers , Mary Lou Soffa, Instrumentation in software dynamic translators for self-managed systems, Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems, p.90-94, October 31-November 01, 2004, Newport Beach, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mathew Zaleski , Marc Berndl , Angela Demke Brown, Mixed mode execution with context threading, Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, p.305-319, October 17-20, 2005, Toranto, Ontario, Canada
|
|
|
Giuseppe Desoli , Nikolay Mateev , Evelyn Duesterwald , Paolo Faraboschi , Joseph A. Fisher, DELI: a new run-time control point, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
|
|
|
|
|
William C. Kreahling , David Whalley , Mark W. Bailey , Xin Yuan , Gang-Ryung Uh , Robert van Engelen, Branch elimination by condition merging, Software—Practice & Experience, v.35 n.1, p.51-74, January 2005
|
|
|
|
|
|
Gregory T. Sullivan , Derek L. Bruening , Iris Baron , Timothy Garnett , Saman Amarasinghe, Dynamic native optimization of interpreters, Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators, p.50-57, June 12-12, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elena Gabriela Barrantes , David H. Ackley , Trek S. Palmer , Darko Stefanovic , Dino Dai Zovi, Randomized instruction set emulation to disrupt binary code injection attacks, Proceedings of the 10th ACM conference on Computer and communications security, October 27-30, 2003, Washington D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nathan Clark , Manjunath Kudlur , Hyunchul Park , Scott Mahlke , Krisztian Flautner, Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.30-40, December 04-08, 2004, Portland, Oregon
|
|
|
Ahmad Zmily , Christos Kozyrakis, Simultaneously improving code size, performance, and energy in embedded processors, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sanjay Bhansali , Wen-Ke Chen , Stuart de Jong , Andrew Edwards , Ron Murray , Milenko Drinić , Darek Mihočka , Joe Chau, Framework for instruction-level tracing and analysis of program executions, Proceedings of the second international conference on Virtual execution environments, June 14-16, 2006, Ottawa, Ontario, Canada
|
|
|
|
|
|
Swaroop Sridhar , Jonathan S. Shapiro , Eric Northup , Prashanth P. Bungale, HDTrans: an open source, low-level dynamic instrumentation system, Proceedings of the second international conference on Virtual execution environments, June 14-16, 2006, Ottawa, Ontario, Canada
|
|
|
|
|
|
Jim Chow , Ben Pfaff , Tal Garfinkel , Kevin Christopher , Mendel Rosenblum, Understanding data lifetime via whole system simulation, Proceedings of the 13th conference on USENIX Security Symposium, p.22-22, August 09-13, 2004, San Diego, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Oliverio J. Santana , Ayose Falcón , Alex Ramirez , Mateo Valero, Branch predictor guided instruction decoding, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jaydeep Marathe , Frank Mueller , Tushar Mohan , Sally A. Mckee , Bronis R. De Supinski , Andy Yoo, METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies, ACM Transactions on Programming Languages and Systems (TOPLAS), v.29 n.2, p.12-es, April 2007
|
|
|
|
|
|
Jungwoo Ha , Christopher J. Rossbach , Jason V. Davis , Indrajit Roy , Hany E. Ramadan , Donald E. Porter , David L. Chen , Emmett Witchel, Improved error reporting for software that uses black-box components, ACM SIGPLAN Notices, v.42 n.6, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiang Wu , Margaret Martonosi , Douglas W. Clark , V. J. Reddi , Dan Connors , Youfeng Wu , Jin Lee , David Brooks, A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.271-282, November 12-16, 2005, Barcelona, Spain
|
|
|
Jiwei Lu , Abhinav Das , Wei-Chung Hsu , Khoa Nguyen , Santosh G. Abraham, Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.93-104, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
Takanobu Baba , Tomohisa Masuho , Takashi Yokota , Kanemitsu Ootsu, Design of a two-level hot path detector for path-based loop optimizations, Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology, p.23-28, April 02-04, 2007, Phuket, Thailand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Feng Qin , Cheng Wang , Zhenmin Li , Ho-seop Kim , Yuanyuan Zhou , Youfeng Wu, LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.135-148, December 09-13, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abhinav Das , Jiwei Lu , Howard Chen , Jinpyo Kim , Pen-Chung Yew , Wei-Chung Hsu , Dong-Yuan Chen, Performance of Runtime Optimization on BLAST, Proceedings of the international symposium on Code generation and optimization, p.86-96, March 20-23, 2005
|
|
|
|
|
|
|
|
|
|
|
|
Leonid Baraz , Tevi Devor , Orna Etzion , Shalom Goldenberg , Alex Skaletsky , Yun Wang , Yigel Zemach, IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.191, December 03-05, 2003
|
|
|
|
|
|
|
|
|
|
|
|
Jiwei Lu , Howard Chen , Rao Fu , Wei-Chung Hsu , Bobbie Othmer , Pen-Chung Yew , Dong-Yuan Chen, The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.180, December 03-05, 2003
|
|
|
|
|
|
Jason D. Hiser , Daniel Williams , Wei Hu , Jack W. Davidson , Jason Mars , Bruce R. Childers, Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems, Proceedings of the International Symposium on Code Generation and Optimization, p.61-73, March 11-14, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiang Wu , Margaret Martonosi , Douglas W. Clark , Vijay Janapa Reddi , Dan Connors , Youfeng Wu , Jin Lee , David Brooks, Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance, IEEE Micro, v.26 n.1, p.119-129, January 2006
|
|
|
M. Hohenauer , F. Engel , R. Leupers , G. Ascheid , H. Meyr , Gerrit Bette , Balpreet Singh, Retargetable code optimization for predicated execution, Proceedings of the conference on Design, automation and test in Europe, March 10-14, 2008, Munich, Germany
|
|
|
|
|
|
|
|
|
Jose Baiocchi , Bruce R. Childers , Jack W. Davidson , Jason D. Hiser , Jonathan Misurda, Fragment cache management for dynamic binary translators in embedded systems with scratchpad, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
|
|
|
Jaejin Lee , Junghyun Kim , Choonki Jang , Seungkyun Kim , Bernhard Egger , Kwangsub Kim , SangYong Han, FaCSim: a fast and cycle-accurate architecture simulator for embedded systems, ACM SIGPLAN Notices, v.43 n.7, July 2008
|
|
|
|
|
|
|
|
|
José A. Baiocchi , Bruce R. Childers , Jack W. Davidson , Jason D. Hiser, Reducing pressure in bounded DBT code caches, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
|
|
|
Tobias Werth , Tobias Flossmann , Michael Klemm , Dominic Schell , Ulrich Weigand , Michael Philippsen, Dynamic code footprint optimization for the IBM Cell Broadband Engine, Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, p.64-72, May 18-18, 2009
|
|
|
Seung Woo Son , Mahmut Kandemir , Mustafa Karakoy , Dhruva Chakrabarti, A compiler-directed data prefetching scheme for chip multiprocessors, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
Mason Chang , Edwin Smith , Rick Reitmaier , Michael Bebenita , Andreas Gal , Christian Wimmer , Brendan Eich , Michael Franz, Tracing for web 3.0: trace compilation for the next generation web applications, Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, March 11-13, 2009, Washington, DC, USA
|
|
|
Andreas Gal , Brendan Eich , Mike Shaver , David Anderson , David Mandelin , Mohammad R. Haghighat , Blake Kaplan , Graydon Hoare , Boris Zbarsky , Jason Orendorff , Jesse Ruderman , Edwin W. Smith , Rick Reitmaier , Michael Bebenita , Mason Chang , Michael Franz, Trace-based just-in-time type specialization for dynamic languages, ACM SIGPLAN Notices, v.44 n.6, June 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel Williams , Jason D. Hiser , Jack W. Davidson, Using program metadata to support SDT in object-oriented applications, Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems, p.55-62, July 06-06, 2009, Genova, Italy
|
|
|
Carl Friedrich Bolz , Antonio Cuni , Maciej Fijalkowski , Armin Rigo, Tracing the meta-level: PyPy's tracing JIT compiler, Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems, p.18-25, July 06-06, 2009, Genova, Italy
|
|