|
ABSTRACT
Poor instruction cache locality can degrade performance on modern architectures. For example, our simulation results show that eliminating all instruction cache misses improves performance by as much as 16% for a modestly sized instruction cache. In this paper, we show how to take advantage of dynamic code generationin a Java Virtual Machine (VM) to improve instruction locality at run-time. We develop a dynamic code reordering (DCR) system; alow overhead, online approach for improving instruction locality. DCR has three optimizations: (1) Interprocedural method separation; (2) Intraprocedural code splitting; and (3) Code padding. DCR uses the dynamic call graph and an edge profile that most VMs already collect to separate hot/cold methods and hot/cold code within a method. It also puts padding between methods to minimize conflict misses between frequent caller/callee pairs. It incrementally performs these optimizations only when the VM is optimizing a method at a higher level. We implement DCR in Jikes RVM and show its overhead is negligible. Extensive simulation and run-time experiments show that a simple code space improves average performance on a Pentium 4 by around 6% on SPEC and DaCapo Java benchmarks. These programs however have very small instruction cache footprints that limit opportunities for DCR to improve performance. Consequently, DCR optimizations on average show little effect, sometimes degrading performance and occasionally improving performance by up to 5%. Our work shows that the VM has the potential to dynamically improve instruction locality incrementally by simply piggybacking on hotspot recompilation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Architecture and policy for adaptive optimization in virtual machines. Technical Report 23429, IBM Research, Nov. 2004.
|
 |
2
|
Matthew Arnold , Stephen Fink , David Grove , Michael Hind , Peter F. Sweeney, Adaptive optimization in the Jalapeño JVM, Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, p.47-65, October 2000, Minneapolis, Minnesota, United States
|
 |
3
|
Matthew Arnold , Adam Welc , V. T. Rajan, Improving virtual machine performance using a cross-run profile repository, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
 |
4
|
|
| |
5
|
|
| |
6
|
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, S. Z. Guyer, A. Hosking, M. Jump, J. E. B. Moss, D. StefanoviĆ, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis. Technical Report TR-CS-06-01, Dept. of Computer Science, Austrailian National University, Mar. 2006. http://ali-www.cs.umass.edu/DaCapo/-Benchmarks.
|
| |
7
|
|
| |
8
|
D. Burger and T. M. Austin. The SimpleScalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.
|
| |
9
|
J. B. Chen and B. D. D. Leupen. Improving instruction locality with just-in-time code layout. In Proceedings of the USENIX Windows NT Workshop, pages 25--32, 1997.
|
| |
10
|
C. Click. Personal communication, Jan 2006.
|
| |
11
|
R. Cohn, D. Goodwin, P. G. Lowney, and N. Rubin. Spike: An Optimizer for Alpha/NT Executables. In USENIX Windows NTWorkshop, pages 17--24, 1997.
|
| |
12
|
|
 |
13
|
Lieven Eeckhout , Andy Georges , Koen De Bosschere, How java programs interact with virtual machines at the microarchitectural level, Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, October 26-30, 2003, Anaheim, California, USA
|
 |
14
|
|
 |
15
|
Amir H. Hashemi , David R. Kaeli , Brad Calder, Efficient procedure mapping using cache line coloring, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.171-182, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
X. Huang, J. E. B.Moss, K. S. McKinley, S. Blackburn, and D. Burger. Dynamic SimpleScalar: Simulating Java virtual machines. Technical Report TR-03-03, University of Texas at Austin, Department of Computer Sciences, Feb. 2003.
|
| |
19
|
Xianglong Huang , Brian T Lewis , Kathryn S McKinley, Dynamic code management: improving whole program code locality in managed runtimes, Proceedings of the 2nd international conference on Virtual execution environments, June 14-16, 2006, Ottawa, Ontario, Canada
|
 |
20
|
Xianglong Huang , Stephen M. Blackburn , Kathryn S. McKinley , J Eliot B. Moss , Zhenlin Wang , Perry Cheng, The garbage collection advantage: improving program locality, Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, October 24-28, 2004, Vancouver, BC, Canada
|
| |
21
|
Jikes Research Virtual Machine (RVM). http://jikesrvm.sourceforge.net.
|
| |
22
|
Chi-Keung Luk , Robert Muth , Harish Patil , Robert Cohn , Geoff Lowney, Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.15, March 20-24, 2004, Palo Alto, California
|
 |
23
|
|
 |
24
|
|
 |
25
|
Alex Ramírez , Josep-L. Larriba-Pey , Carlos Navarro , Josep Torrellas , Mateo Valero, Software trace cache, Proceedings of the 13th international conference on Supercomputing, p.119-126, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305178]
|
| |
26
|
|
| |
27
|
D. Scales. Efficient Dynamic Procedure Placement. Technical Report WRL-98/5, Compaq WRL Research Lab, May 1998.
|
| |
28
|
Standard Performance Evaluation Corporation. SPECjvm98 Docu-mentation, release 1.03 edition, March 1999.
|
| |
29
|
Standard Performance Evaluation Corporation. SPECjbb2000 (JavaBusiness Benchmark) Documentation, release 1.01 edition, 2001.
|
| |
30
|
J. Whaley. Dynamic Optimization Through the Use of Automatic Runtime Specialization. Master's thesis, Massachusetts Institute of Technology, May 1999.
|
| |
31
|
B. Zorn. Performance in the Age of Trustworthy Computing, January 2004. Presentation at the DaCapo winter meeting. The University of Colorado, Boulder, CO.
|
|