ACM Home Page
Please provide us with feedback. Feedback
A new approach to parallelising tracing algorithms
Full text PdfPdf (444 KB)
Source
International Symposium on Memory Management archive
Proceedings of the 2009 international symposium on Memory management table of contents
Dublin, Ireland
SESSION: Paper session 1 table of contents
Pages 10-19  
Year of Publication: 2009
ISBN:978-1-60558-347-1
Authors
Cosmin E. Oancea  The University of Cambridge, Cambridge, United Kingdom
Alan Mycroft  The University of Cambridge, Cambridge, United Kingdom
Stephen M. Watt  The University of Western Ontario, London, ON, Canada
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 70,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542431.1542434
What is a DOI?

ABSTRACT

Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshaling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors.

This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-reader-single-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies.

While it is clear that our solution can be more effective on NUMA systems and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modern multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
C. Attanasio, D. Bacon, A. Cocchi, and S. Smith. A Comparative Evaluation of Parallel Garbage Collectors. In LCPC, Springer Verlag, pages 177--192, 2001.
 
3
 
4
5
 
6
7
8
9
 
10
Yannis Chicha and Stephen Watt. A Localised Tracing Scheme applied to Garbage Collection. In APLAS, LNCS 4279, 2006.
11
12
13
 
14
15
16
17
 
18
 
19
Intel. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide. In http://www.intel.com/products/processor/manuals/index.htm, 2008.
 
20
21
22
 
23
24
25
 
26

Collaborative Colleagues:
Cosmin E. Oancea: colleagues
Alan Mycroft: colleagues
Stephen M. Watt: colleagues