ACM Home Page
Please provide us with feedback. Feedback
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Full text PdfPdf (647 KB)
Source
Conference on Object Oriented Programming Systems Languages and Applications archive
Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications table of contents
Orlando, Florida, USA
SESSION: Memory table of contents
Pages 361-376  
Year of Publication: 2009
ISBN:978-1-60558-766-0
Also published in ...
Authors
Yi Zhao  IBM, Beijing, China
Jin Shi  Tsinghua University, Beijing, China
Kai Zheng  IBM, Beijing, China
Haichuan Wang  IBM, Beijing, China
Haibo Lin  IBM, Beijing, China
Ling Shao  IBM, Beijing, China
Sponsor
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 42,   Downloads (12 Months): 42,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1640089.1640116
What is a DOI?

ABSTRACT

Multi-core processors are widely used in computer systems. As the performance of microprocessors greatly exceeds that of memory, the memory wall becomes a limiting factor. It is important to understand how the large disparity of speed between processor and memory influences the performance and scalability of Java applications on emerging multi-core platforms.

In this paper, we studied two popular Java benchmarks, SPECjbb2005 and SPECjvm2008, on multi-core platforms including Intel Clovertown and AMD Phenom. We focus on the "partially scalable" benchmark programs. With smaller number of CPU cores these programs scale perfectly, but when more cores and software threads are used, the slope of the scalability curve degrades dramatically.

We identified a strong correlation between scalability, object allocation rate and memory bus write traffic in our experiments with our partially scalable programs. We find that these applications allocate large amounts of memory and consume almost all the memory write bandwidth in our hardware platforms. Because the write bandwidth is so limited, we propose the following hypothesis: the scalability and performance is limited by the object allocation on emerging multi-core platforms for those objects-allocation intensive Java applications, as if these applications are running into an "allocation wall".

In order to verify this hypothesis, several experiments are performed, including measuring key architecture level metrics, composing a micro-benchmark program, and studying the effect of modifying some of the "partially scalable" programs. All the experiments strongly suggest the existence of the allocation wall.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
RTSJ(Real Time Specification for Java) Main Page. http://www.rtsj.org/.
 
2
AMD. Amd phenom x4 quad-core and amd phenom x3 triplecore processors. http://www.amd.com/us-en/Processors/ProductInformation/0,,30 118 15331 15%332,00.html.
 
3
BLACKBURN, S. M., CHENG, P., AND MCKINLEY, K. S. Myths and realities: The performance impact of garbage collection. In Proceedings of the ACM Conference on Measurement & Modeling Computer Systems (2004), ACM Press, pp. 25--36.
 
4
BLACKBURN, S. M., GARNER, R., HOFFMAN, C., KHAN, A. M., MCKINLEY, K. S., BENTZUR, R., DIWAN, A., FEINBERG, D., FRAMPTON, D., GUYER, S. Z., HIRZEL, M., HOSKING, A., JUMP, M., LEE, H., MOSS, J. E. B., PHANSALKAR, A., STEFANOVIĆ, D., VANDRUNEN, T., VON DINCKLAGE, D., AND WIEDERMANN, B. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications (New York, NY, USA, Oct. 2006), ACMPress, pp. 169--190.
 
5
BURGER, D., GOODMAN, J. R., AND KAGI, A. Memory bandwidth limitations of future microprocessors. In ISCA (1996).
 
6
CHEREM, S., AND RUGINA, R. Uniqueness inference for compile-time object deallocation. In ISMM (2007).
 
7
CORP., I. Intel microarchitecture (nehalem). http://www.intel.com/technology/architecturesilicon/nextgen/index.htm.
 
8
DILLIG, I., DILLIG, T., YAHAV, E., AND CHANDRA, S. The closer: Automating resource management in java. In ISMM (2008).
 
9
FENICHEL, R. R., AND YOCHELSON, J. C. A lisp garbagecollector for virtual memory computer systems. Communications of the ACM (1969).
 
10
GANESH, B., JALEEL, A., WANG, D., AND JACOB, B. Fully-buffered DIMM memory architectures: Understanding mechanisms, overheads and scaling. In HPCA (2007).
 
11
GEORGES, A., BUYTAERT, D., AND EECKHOUT, L. Statistically rigorous java performance evaluation. SIGPLAN Not. 42, 10 (2007), 57--76.
 
12
HAMMOND, L., NAYFEH, B. A., AND OLUKOTUN, K. A single-chip multiprocessor. Computer 30, 9 (1997), 79--85.
 
13
HOFSTEE, H. Power efficient processor architecture and the cell processor. High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on (12-16 Feb. 2005), 258--262.
 
14
HUANG, W., QIAN, Y., SRISA-AN, W., AND CHANG, J. Object allocation and memory contention study of java multithreaded applications. Performance, Computing, and Communications, 2004 IEEE International Conference on (2004), 375--382.
 
15
IBM CORP. http://www.ibm.com/systems/bladecenter/hardware/servers/hs21/index.html.
 
16
INTEL CORP. http://processorfinder.intel.com/details.aspx?sspec=slac5.
 
17
INTEL CORP. http://www.intel.com/Products/Server/Chipsets/5000P/5000Poverview.htm.
 
18
IYER, R., BHAT, M., ZHAO, L., ILLIKKAL, R., MAKINENI, S., JONES, M., SHIV, K., AND NEWELL, D. Exploring smallscale and large-scale cmp architectures for commercial javaservers. IEEE Workload Characterization Symposium 0 (2006), 191--200.
 
19
JOISHA, P. G. A principled approach to nondeferred reference-counting garbage collection. In VEE (2008).
 
20
KONGETIRA, P., AINGARAN, K., AND OLUKOTUN., K. Niagara: A 32-way multithreaded sparc processor. In IEEE Micro (2005).
 
21
LARRY, M., AND CARL, S. lmbench: Portable tools for performance analysis. Proceedings of the USENIX 1996 Annual Technical Conference (1996).
 
22
LEVON, J., AND ELIE., P. Oprofile: A system profiler for linux.
 
23
LIEBERMAN, H., AND HEWITT, C. A realtime garbage collector based on the lifetimes of objects. Communications of the ACM (1983).
 
24
LUO, Y., AND JOHN, L. K. Simulating java commercial throughput workload: A case study. In ICCD (2005).
 
25
MARDEN, M., LIEN LU, S., LAI, K., AND LIPASTI, M. Comparison of memory system behavior in java and nonjava commercial workloads. In Proceedings of the Workshop on Computer Architecture Evaluation using Commercial Workloads (2002).
 
26
MCCALPIN, J. D. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.
 
27
MCCARTHY, J. Recursive functions of symbolic expressions and their computation by machine. Communications of the ACM (1960).
 
28
PERSSON, M. Java technology, IBM style: Garbage collection policies. IBM developerWorks (2006).
 
29
SESHADRI, P., AND JOHN, L. K. Workload characterization of java server applications on two powerpc processors. In In Proceedings of the Third Annual Austin Center for Advanced Studies Conference (2002), pp. 328--333.
 
30
SHAHAM, R., KOLODNER, E. K., AND SAGIV, M. Heap profiling for space-efficient java. In PLDI '01: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation (New York, NY, USA, 2001), ACM, pp. 104--113.
 
31
SHANKAR, A., ARNOLD, M., AND BODIK, R. Jolt: lightweight dynamic analysis and removal of object churn. In OOPSLA '08: Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications (New York, NY, USA, 2008), ACM, pp. 127--142.
 
32
SHIV, K., CHOW, K., WANG, Y., AND PETROCHENKO, D. Specjvm2008 performance characterization. In SPEC Benchmark Workshop (2009), pp. 17--35.
 
33
SHIV, K., IYER, R., BHAT, M., ILLIKKAL, R., JONES, M., MAKINENI, S., DOMER, J., AND NEWELL, D. Addressing cache/memory overheads in enterprise java cmp servers. In ISSWC (2007).
 
34
SPEC. SPECjbb2005 (Java Server Benchmark). http://www.spec.org/jbb2005/.
 
35
SPEC. SPECjvm2008 Benchmarks. http://www.spec.org/jvm2008/docs/benchmarks/index.html.
 
36
SPEC. SPECjvm2008 (Java Virtual Machine Benchmark). http://www.spec.org/jvm2008/.
 
37
SPRACKLEN, L., AND ABRAHAM, S. G. Chip multithreading: Opportunities and challenges. In HPCA (2005).
 
38
SUN MICROSYSTEMS. Tuning Garbage Collection with the 5.0 Java Virtual Machine.
 
39
TIKIR, M. M., AND HOLLINGSWORTH, J. K. Numa-aware java heaps for server applications. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers (Washington, DC, USA, 2005), IEEE Computer Society, p. 108.2.
 
40
TSENG, J. H., YU, H., NAGAR, S., DUBEY, N., FRANKE, H., PATTNAIK, P., INOUE, H., AND NAKATANI, T. Performance studies of commercial workloads on a multi-core system. In IISWC (2007).
 
41
TUCK, N., AND TULLSEN, D. M. Initial observations of the simultaneous multithreading pentium 4 processor. Parallel Architectures and Compilation Techniques, International Conference on 0 (2003), 26.
 
42
UNGAR, D. Generation scavenging: a non-disruptive high performance storage reclamation algorithm. In ACM SIGSOFT Software Engineering Notes (1984).
 
43
VOGT, P. D. Fully buffered DIMM (FB-DIMM) server memory architecture: Capacity, performance, reliability, and longevity. Intel Developer Forum (2004).
 
44
WULF, W. A., AND MCKEE, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News (1995).
 
45
XIAN, F., SRISA-AN, W., AND JIANG, H. Microphase: An approach to proactively invoking garbage