|
ABSTRACT
This paper describes our sampling-based profiler that exploits a processor's HPM (Hardware Performance Monitor) to collect information on running Java applications for use by the Java VM. Our profiler provides two novel features: Java-level event profiling and lightweight context-sensitive event profiling. For Java events, we propose new techniques to leverage the sampling facility of the HPM to generate object creation profiles and lock activity profiles. The HPM sampling is the key to achieve a smaller overhead compared to profilers that do not rely on hardware helps. To sample the object creations with the HPM, which can only sample hardware events such as executed instructions or cache misses, we correlate the object creations with the store instructions for Java object headers. For the lock activity profile, we introduce an instrumentation-based technique, called ProbeNOP, which uses a special NOP instruction whose executions are counted by the HPM. For the context-sensitive event profiling, we propose a new technique called CallerChaining, which detects the calling context of HPM events based on the call stack depth (the value of the stack frame pointer). We show that it can detect the calling contexts in many programs including a large commercial application. Our proposed techniques enable both programmers and runtime systems to get more valuable information from the HPM to understand and optimize the programs without adding significant runtime overhead.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Ammons, T. Ball, and J. R. Larus. "Exploiting hardware performance counters with flow and context sensitive profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 85--96, 1997.
|
| |
2
|
N. Grcevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan. "Java just-in-time compiler and virtual machine improvements for server and middleware applications". In Proceedings of the USENIX Virtual Machine Research and Technology Symposium, pp. 151--162, 2004.
|
| |
3
|
H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. "IBM POWER6 microarchitecture". IBM Journal of Research and Development, Vol. 51 (6), pp. 639--662, 2007.
|
| |
4
|
A. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. "Prefetch injection based on hardware monitoring and object metadata". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 267--276, 2004.
|
| |
5
|
T. Ogasawara, H. Komatsu, and T. Nakatani. "To-lock: Removing lock overhead using the owners' temporal locality". In Proceedings of the Conference on Parallel Architectures and Compilation Techniques, pp. 255-266, 2004.
|
| |
6
|
K. Kawachiya, A. Koseki, and T. Onodera. "Lock reservation: Java locks can mostly do without atomic operations". In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 292--310, 2002.
|
| |
7
|
R. Jones and C. Ryder. "A Study of Java Object Demographics". In Proceedings of the ACM International Symposium on Memory Management, pp. 121--130, 2008.
|
| |
8
|
M. L. Seidl and B. G. Zorn. "Segregating heap objects by reference behavior and lifetime". In Proceedings of the eighth Architectural Support for Programming Languages and Operating Systems, pp 12--23, 1998.
|
| |
9
|
F. E. Levine. "A programmer's view of performance monitoring in the PowerPC microprocessor". IBM Journal of Research and Development, Vol 41 (3), pp. 345--356, 1997.
|
| |
10
|
OProfile - A System Profiler for Linux. http://oprofile.sourceforge.net/news/
|
| |
11
|
Intel Corp. IA-32 Intel Architecture Software Developer's Manual.
|
| |
12
|
JVM Tool Interface version 1.0. http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html
|
| |
13
|
M. Jump, S. M. Blackburn, and K.S. McKinley. "Dynamic object sampling for pretenuring", In Proceedings of the International Symposium on Memory Management, pp. 152--162, 2004.
|
| |
14
|
M. Hauswirth and T. M. Chilimbi. "Low-overhead memory leak detection using adaptive statistical profiling", in Proceedings of the international conference on Architectural support for programming languages and operating systems table of contents, pp. 156--164, 2004.
|
| |
15
|
M. Arnold, and B. G. Ryder. "A framework for reducing the cost of instrumented code". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 168--179, 2001.
|
| |
16
|
J. M. Spivey. "Fast, Accurate Call Graph Profiling". Software: Practice and Experience, Vol. 34 (3), pp. 249--264, 2004.
|
| |
17
|
M. D. Bond, and K. S. McKinley. "Probabilistic Calling Context". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 97--112, 2007.
|
| |
18
|
X. Zhuang, M. J. Serrano, H. W. Cain, and J Choi. "Accurate, efficient, and adaptive calling context profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 263--271, 2006.
|
| |
19
|
M. Arnold and P. F. Sweeney. "Approximating the calling context tree via sampling". IBM Research Report, 2000.
|
| |
20
|
J. Whaley. "A portable sampling-based profiler for java virtualmachines". In Proceedings of ACM Java Grande, pp. 78--87, 2000.
|
| |
21
|
T. Mytkowicz, D. Coughlin, and A. Diwan. "Inferred Call Path Profiling", In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, to appear, 2009.
|
| |
22
|
F. T. Schneider, M. Payer, and T. R. Gross. "Online optimizations driven by hardware performance monitoring". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 373--382, 2007.
|
| |
23
|
J. Cuthbertson, S. Viswanathan, K. Bobrovsky, A. Astapchuk, E. Kaczmarek, and U. Srinivasan. "A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM". In Proceedings of the International Symposium on Code Generation and Optimization, pp. 190--199, 2009.
|
| |
24
|
M. Serrano and X. Zhuang, "Placement Optimization Using Data Context Collected During Garbage Collection", In Proceedings of the International Symposium on Memory Management, pp. 69--78, 2009.
|
| |
25
|
J. Dolby. "Automatic Inline Allocation of Objects", In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 7--17, 1997.
|
| |
26
|
Power.org, Power Instruction Set Architecture Version 2.05. http://www.power.org/resources/reading/PowerISA_V2.05.pdf
|
| |
27
|
N. Grcevski, "Effective method for Java Lock Reservation for Java Virtual Machines that Have Cooperative Multithreading" 6th Workshop on Compiler-Driven Performance, 2007.
|
| |
28
|
D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. "Thin Locks: Featherweight Synchronization for Java". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 258--268, 1998.
|
| |
29
|
T. Onodera and K. Kawachiya. "A study of locking objects with bimodal fields". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 223--237, 1999.
|
| |
30
|
Performance Inspector, http://perfinsp.sourceforge.net/
|
| |
31
|
S. L. Graham, P. B. Kessler, and M K. McKusick. "An execution profiler for modular programs". Software: Practice and Experience, Vol. 13 (8), pp. 671--685, 1983.
|
| |
32
|
Standard Performance Evaluation Corporation. SPECjbb2005. http://www.spec.org/jbb2005/
|
| |
33
|
Standard Performance Evaluation Corporation. SPECjvm2008. http://www.spec.org/jvm2008/
|
| |
34
|
The Apache Software Foundation. DayTrader. http://cwiki.apache.org/GMOxDOC20/daytrader.html
|
| |
35
|
IBM Corporation. WebSphere Application Server. http://www-01.ibm.com/software/webservers/appserv/was/
|
|