| A scalable mark-sweep garbage collector on large-scale shared-memory machines |
| Full text |
Pdf
(97 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM)
table of contents
San Jose, CA
Pages: 1 - 14
Year of Publication: 1997
ISBN:0-89791-985-8
|
|
Authors
|
|
Toshio Endo
|
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan
|
|
Kenjiro Taura
|
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan
|
|
Akinori Yonezawa
|
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 22, Citation Count: 15
|
|
|
ABSTRACT
This work describes implementation of a mark-sweep garbage collector (GC) for shared-memory machines and reports its performance. It is a simple ''parallel'' collector in which all processors cooperatively traverse objects in the global shared heap. The collector stops the application program during a collection and assumes a uniform access cost to all locations in the shared heap. Implementation is based on the Boehm-Demers-Weiser conservative GC (Boehm GC). Experiments have been done on Ultra Enterprise 10000 (Ultra Sparc processor 250 MHz, 64 processors). We wrote two applications, BH (an N-body problem solver) and CKY (a context free grammar parser) in a parallel extension to C++.Through the experiments, We observe that load balancing is the key to achieving scalability. A naive collector without load redistribution hardly exhibits speed-up (at most fourfold speed-up on 64 processors). Performance can be improved by dynamic load balancing, which exchanges objects to be scanned between processors, but we still observe that straightforward implementation severely limits performance. First, large objects become a source of significant load imbalance, because the unit of load redistribution is a single object. Performance is improved by splitting a large object into small pieces before pushing it onto the mark stack. Next, processors spend a significant amount of time uselessly because of serializing method for termination detection using a shared counter. This problem suddenly appeared on more than 32 processors. By implementing non-serializing method for termination detection, the idle time is eliminated and performance is improved. With all these careful implementation, we achieved average speed-up of 28.0 in BH and 28.6 in CKY on 64 processors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Josh Barnes and Piet Hut. A hirarchical O(N log N) force-calculation algorithm. Nature, 324:446-449, 1986.
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
David L. Detlefs. Concurrent garbage collection for C++. In Topics in Advanced Language Implementation, chapter 5, pages 101-134. The MIT Press, 1991.
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
Nobuyuki Ichiyoshi and Masao Morita. A shared-memory parallel extension of KLIC and its garbage collection. In Proceedings of FGCS '94 Workshop on Parallel Logic Programming, pages 113-126, 1994.
|
| |
12
|
|
| |
13
|
|
| |
14
|
Yoshikazu Kamoshida. HOCS: A C++ extension with parallel objects and multi-threading. February 1997. (senior thesis), The University of Tokyo.
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
Kenjiro Taura, Satoshi Matsuoka, and Akinori Yonezawa. ABCL/f: A future-based polymorphic typed concurrent object-oriented language - its design and implementation -. In number 18 in Dimacs Series in Discrete Mathematics and Theoretical Computer Science, pages 275-292. the DIMACS work shop on Specification of Algorithms, 1994.
|
 |
20
|
|
| |
21
|
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Detlefs , Christine Flood , Steve Heller , Tony Printezis, Garbage-first garbage collection, Proceedings of the 4th international symposium on Memory management, October 24-25, 2004, Vancouver, BC, Canada
|
|
|
Yoav Ossia , Ori Ben-Yitzhak , Irit Goft , Elliot K. Kolodner , Victor Leikehman , Avi Owshanko, A parallel, incremental and concurrent GC for servers, ACM SIGPLAN Notices, v.37 n.5, May 2002
|
|
|
Katherine Barabash , Ori Ben-Yitzhak , Irit Goft , Elliot K. Kolodner , Victor Leikehman , Yoav Ossia , Avi Owshanko , Erez Petrank, A parallel, incremental, mostly concurrent garbage collector for servers, ACM Transactions on Programming Languages and Systems (TOPLAS), v.27 n.6, p.1097-1146, November 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Simon Marlow , Tim Harris , Roshan P. James , Simon Peyton Jones, Parallel generational-copying garbage collection with a block-structured heap, Proceedings of the 7th international symposium on Memory management, June 07-08, 2008, Tucson, AZ, USA
|
|
|
|
|
|
|
|