|
ABSTRACT
Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
U. Acar, E. Berger, R. Blumofe, and D. Papadopoulos. Hood: A threads library for multiprogrammed multiprocessors. http://www.cs.utexas.edu/users/hood, Sept. 1999.
|
| |
2
|
J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446--449, 1986.
|
| |
3
|
bCandid.com, Inc. http://www.bcandid.com.
|
| |
4
|
|
| |
5
|
B. Bigler, S. Allan, and R. Oldehoeft. Parallel dynamic storage allocation. International Conference on Parallel Processing, pages 272-275, 1985.
|
| |
6
|
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), pages 356--368, Santa Fe, New Mexico, Nov. 1994.
|
| |
7
|
Coyote Systems, Inc. http://www.coyotesystems.com.
|
| |
8
|
|
| |
9
|
W. Gloger. Dynamic memory allocator implementations in linux system libraries. http:llwww.dent.med.uni-muenehen.del" wmglo/malloc-slides.html.
|
| |
10
|
A. Gottlieb and J. Wilson. Using the buddy system for concurrent memory allocation. Technical Report System Software Note 6, Courant Institute, 1981.
|
| |
11
|
A. Gottlieb and J. Wilson. Parallelizing the usual buddy algorithm. Technical Report System Software Note 37, Courant Institute, 1982.
|
 |
12
|
Dirk Grunwald , Benjamin Zorn , Robert Henderson, Improving the cache locality of memory allocation, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, p.177-186, June 21-25, 1993, Albuquerque, New Mexico, United States
|
| |
13
|
|
| |
14
|
A. K. Iyengar. Parallel dynamic storage allocation algorithms. In Fifth IEEE Symposium on Parallel and Distributed Processing. IEEE Press, 1993.
|
 |
15
|
|
| |
16
|
T. Johnson. A concurrent fast-fits memory manager. Technical Report TR91-009, University of Florida, Department of CIS, 1991.
|
| |
17
|
T. Johnson and T. Davis. Space efficient parallel buddy memory management. Technical Report TR92-008, University of Florida, Department of CIS, 1992.
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
M. R. Krishnan. Heap: Pleasures and pains. Microsoft Developer Newsletter, Feb. 1999.
|
 |
22
|
|
| |
23
|
D. Lea. A memory allocator. http://g.oswego.edu/dl/html/malloe.html.
|
| |
24
|
B. Lewis. comp. programming, t h r e a d s FAQ. http://www.lambdacs.conffnewsgroup/FAQ.html.
|
| |
25
|
P. E. McKenney and J. Slingwine. Efficient kernel memory allocation on shared-memory multiprocessor. In USENIX Association, editor, Proceedings of the Winter 1993 USENIX Conference: January 25-29, 1993, San Diego, California, USA, pages 295-305, Berkeley, CA, USA, Winter 1993. USENIX.
|
| |
26
|
MicroQuill, Inc. http://www.mieroquill.eom.
|
| |
27
|
MySQL, Inc. The mysql database manager. http://www.mysql.org.
|
 |
28
|
|
| |
29
|
J. M. Robson. Worst case fragmentation of first fit and best fit storage allocation strategies. ACM Computer Journal, 20(3):242-244, Aug. 1977.
|
| |
30
|
SGI. The standard template library for c++: Allocators. http :llwww.sg).comlTechno l ogy lS TLI All ocators.html.
|
| |
31
|
Standard Performance Evaluation Corporation. SPECweb99. http:l/www.spee.orglosglweb99L
|
| |
32
|
|
| |
33
|
D. Stein and D. Shah. Implementing lightweight threads. In Proceedings of the 1992 USENIX Summer Conference, pages 1-9, 1992.
|
| |
34
|
H. Stone. Parallel memory allocation using the FETCH-AND-ADD instruction. Technical Report RC 9674, IBM T. J. Watson Research Center, Nov. 1982.
|
| |
35
|
Time-Warner/AOL, Inc. AOLserver 3.0. http://www.aolserver.com.
|
| |
36
|
|
| |
37
|
|
|