ACM Home Page
Please provide us with feedback. Feedback
NUMA-aware memory manager with dominant-thread-based copying GC
Full text PdfPdf (438 KB)
Source
Conference on Object Oriented Programming Systems Languages and Applications archive
Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications table of contents
Orlando, Florida, USA
SESSION: Memory table of contents
Pages 377-390  
Year of Publication: 2009
ISBN:978-1-60558-766-0
Also published in ...
Author
Takeshi Ogasawara  IBM Research - Tokyo, Yamato, Japan
Sponsor
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 26,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1640089.1640117
What is a DOI?

ABSTRACT

We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of local memory for high performance. To locate the CPU of a thread that frequently accesses an object, prior research uses the runtime information about memory accesses as sampled by the hardware. However, the overhead of this approach is high for a garbage collector.

Our approach uses the information about which thread can exclusively access an object, or the Dominant Thread (DoT). The dominant thread of an object is the thread that often most accesses an object so that we do not require memory access samples. Our NUMA-aware GC performs DoT based object copying, which copies each live object to the CPU where the dominant thread was last dispatched before GC. The dominant thread information is known from the thread stack and from objects that are locked or reserved by threads and is propagated in the object reference graph.

We demonstrate that our approach can improve the performance of benchmark programs such as SPECpower ssj2008, SPECjbb2005, and SPECjvm2008.We prototyped a NUMAaware memory manager on a modified version of IBM Java VM and tested it on a cc-NUMA POWER6 machine with eight NUMA nodes. Our NUMA-aware GC achieved performance improvements up to 14.3% and 2.0% on average over a JVM that only used the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 53.1% and 10.8% on average.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Advanced Micro Devices, Inc. Performance guidelines for AMD AthlonTM64 and AMD OpteronTMccNUMA multiprocessor systems, 2006.
 
2
David F. Bacon, Ravi Konuru, Chet Murthy, and Mauricio Serrano. Thin locks: featherweight synchronization for Java. In PLDI '98: Proc. ACM SIGPLAN 1998 Conf. on Programming language design and implementation, pages 258--268, 1998.
 
3
BEA Systems, Inc. BEA JRockit, r27.5, commandline reference. http://e-docs.bea.com/jrockit/jrdocs/pdf/refman.pdf, January 2008.
 
4
Martin J. Bligh, Matt Dobson, Darren Hart, and Gerrit Huizenga. Linux on NUMA systems. In Proceedings of the Linux Symposium, pages 295--306, 2004.
 
5
B. Brock, G. Carpenter, E. Chiprout, E. Elnozahy, M. Dean, D. Glasco, J. Peterson, R. Rajamony, F. Rawson, R. Rockhold, and A. Zimmerman. Windows NT in a ccNUMA system. In WINSYM'99: Proceedings of the 3rd conference on USENIX Windows NT Symposium, pages 7--7, Berkeley, CA, USA, 1999. USENIX Association.
 
6
Myra Cohen, Shiu Beng Kooi, and Witawas Srisa-an. Clustering the heap in multi-threaded applications for improved garbage collection. In GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pages 1901--1908, New York, NY, USA, 2006. ACM. ISBN 1-59593-186-4. doi: http://doi.acm.org/10.1145/1143997.1144314.
 
7
IBM Corp. IBM @server pSeries 650 performance and tuning. http://www-03.ibm.com/systems/resources/systems_p_hardware_whitepapers_p650_perf.pdf, 2002.
 
8
IBM Corp. AIX 5L differences guide version 5.3 edition. http://www.redbooks.ibm.com/redpieces/pdfs/sg247463.pdf, 2004.
 
9
David Dagastine and Paul Hohensee. High performance Java technology in a multi-core world. http://developers.sun.com/learning/javaoneonline/2007/pdf/TS-2885.pdf, 2007.
 
10
Robert Dimpsey, Rajiv Arora, and Kean Kuiper. Java server performance: a case study of building efficient, stable JVMs. IBM Systems Journal, 39(1):151--174, 2000.
 
11
Tamar Domani, Gal Goldshtein, Elliot K. Kolodner, Ethan Lewis, Erez Petrank, and Dafna Sheinwald. Thread-local heaps for Java. In ISMM '02: Proceedings of the 3rd International Symposium on Memory Management, pages 76--87, New York, NY, USA, 2002. ACM. ISBN 1-58113-539-4. doi: http://doi.acm.org/10.1145/512429.512439.
 
12
Nikola Grcevski. Effective method for Java lock reservation for Java virtual machines that have cooperative multithreading. In 6th Workshop on Compiler-Driven Performance, Associated with CASCON2007, 2007.
 
13
Nikola Grcevski, Allan Kielstra, Kevin Stoodley, Mark G. Stoodley, and Vijay Sundaresan. Java just-intime compiler and virtual machine improvements for server and middleware applications. In Proc. the 3rd Virtual Machine Research and Technology Symposium, pages 151--162, 2004.
 
14
Martin Hirzel. Data layouts for object-oriented programs. In SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 265--276, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-639-4. doi: http://doi.acm.org/10.1145/1254882.1254915.
 
15
IBM Corp. IBM developer kit and runtime environment, Java technology edition, version 6 diagnostics guide. http://download.boulder.ibm.com/ibmdl/pub/software/dw/jdk/diagnosis/diag60.pdf.
 
16
IBM Corp. WebSphere Application Server Network Deployment, version 6.1, tuning Java virtual machines. http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/tprf_tunejvm_v61.html.
 
17
Intel Corp. Intel 64 and IA-32 architectures optimization reference manual, 2009.
 
18
Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. Lock reservation: Java locks can mostly do without atomic operations. In Proc. ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications, pages 130--141, 2002.
 
19
Hung Q. Le,William J. Starke, J. Stephen Fields, Francis P. O'Connell, Dung Q. Nguyen, Bruce J. Ronchetti, Wolfram M. Sauer, Eric M. Schwarz, and Michael T. (Mike) Vaden. IBM POWER6 microarchitecture. IBM J. Res. Dev., 51(6):639--662, 2007. ISSN 0018-8646.
 
20
Kyungwoo Lee and Samuel P. Midkiff. A two-phase escape analysis for parallel Java programs. In PACT'06: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 53--62, New York, NY, USA, 2006. ACM. ISBN 1-59593-264-X. doi: http://doi.acm.org/10.1145/1152154.1152166.
 
21
Kyungwoo Lee, Xing Fang, and Samuel P. Midkiff. Practical escape analyses: how good are they? In VEE '07: Proceedings of the 3rd International Conference on Virtual Execution Environments, pages 180--190, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-630-1. doi: http://doi.acm.org/10.1145/1254810.1254836.
 
22
Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Commun. ACM, 26(6):419--429, 1983. ISSN 0001-0782. doi: http://doi.acm.org/10.1145/358141.358147.
 
23
Jaydeep Marathe and Frank Mueller. Hardware profileguided automatic page placement for ccnuma systems. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 90--99, New York, NY, USA, 2006. ACM. ISBN 1-59593-189-9. doi: http://doi.acm.org/10.1145/1122971.1122987.
 
24
Takeshi Ogasawara, Hideaki Komatsu, and Toshio Nakatani. TO-lock: Removing lock overhead using the owners' temporal locality. In PACT '04: Proc. 13th International Conf. on Parallel Architectures and Compilation Techniques, pages 255--266, 2004.
 
25
Mattias Persson. Java technology, IBM style: Garbage collection policies, part 1. http://http://www.ibm.com/developerworks//library/j-ibmjava2/, 2006.
 
26
Kenneth Russell and David Detlefs. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In Proc. ACM SIGPLAN Conf. on Object-Oriented Programming Systems, Languages and Applications, pages 263--272, 2006.
 
27
Ryan Sciampacone. A production world view of garbage collection for the Java ME and SE spaces. http://www.mm-net.org.uk/school/programme.html.
 
28
Yefim Shuf, Manish Gupta, Rajesh Bordawekar, and Jaswinder Pal Singh. Exploiting prolific types for memory management and optimizations. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 295--306, New York, NY, USA, 2002. ACM. ISBN 1-58113-450-9. doi: http://doi.acm.org/10.1145/503272.503300.
 
29
Bjarne Steensgaard. Thread-specific heaps for multithreaded programs. In ISMM '00: Proceedings of the 2nd International Symposium on Memory Management, pages 18--24, New York, NY, USA, 2000. ACM. ISBN 1-58113-263-8. doi: http://doi.acm.org/10.1145/362422.362432.
 
30
Sun Microsystems, Inc. Ergonomics in the 5.0 JavaTMvirtual machine. http://java.sun.com/docs/hotspot/gc5.0/ergo5.html.
 
31
Sun Microsystems, Inc. SolarisTMmemory placement optimization and Sun FireTMservers. http://www.sun.com/servers/wp/docs/mpo_v7_CUSTOMER.pdf, 2003.
 
32
Sun Microsystems, Inc. Java SE 6 HotSpotTMvirtual machine garbage collection tuning. http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html, 2008.
 
33
Sun Microsystems, Inc. Sun SPARC R enterprise T5440 server architecture, 2008.
 
34
Christian Terboven, Dieter an Mey, Dirk Schmidl, Henry Jin, and Thomas Reichstein. Data and thread affinity in openmp programs. In MAW '08: Proceedings of the 2008 workshop on Memory access on future processors, pages 377--384, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-091-3. doi: http://doi.acm.org/10.1145/1366219.1366222.
 
35
Mustafa M. Tikir and Jeffery K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers, page 108.2, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2312-9. doi: http://dx.doi.org/10.1109/IPDPS.2005.299.
 
36
Mustafa M. Tikir and Jeffrey K. Hollingsworth. Using hardware counters to automatically improve memory performance. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 46, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2153-3. doi: http://dx.doi.org/10.1109/SC.2004.64.
 
37
David Ungar. Generation scavenging: A nondisruptive high performance storage reclamation algorithm. In SDE 1: Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments, pages 157--167, New York, NY, USA, 1984. ACM. ISBN 0-89791-131-8. doi: http://doi.acm.org/10.1145/800020. 808261.