ACM Home Page
Please provide us with feedback. Feedback
Efficient hashing with lookups in two memory accesses
Full text PdfPdf (1.01 MB)
Source Symposium on Discrete Algorithms archive
Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
Vancouver, British Columbia
SESSION: Session 9A table of contents
Pages: 830 - 839  
Year of Publication: 2005
ISBN:0-89871-585-7
Author
Rina Panigrahy  Cisco Systems, San Jose, CA
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
: SIAM Activity Group on Discrete Mathematics
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 62,   Citation Count: 4
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

The study of hashing is closely related to the analysis of balls and bins. Azar et. al. [1] showed that instead of using a single hash function if we randomly hash a ball into two bins and place it in the smaller of the two, then this dramatically lowers the maximum load on bins. This leads to the concept of two-way hashing where the largest bucket contains O(log log n) balls with high probability. The hash look up will now search in both the buckets an item hashes to. Since an item may be placed in one of two buckets, we could potentially move an item after it has been initially placed to reduce maximum load. Using this fact, we present a simple, practical hashing scheme that maintains a maximum load of 2, with high probability, while achieving high memory utilization. In fact, with n buckets, even if the space for two items are pre-allocated per bucket, as may be desirable in hardware implementations, more than n items can be stored giving a high memory utilization. Assuming truly random hash functions, we prove the following properties for our hashing scheme.• Each lookup takes two random memory accesses, and reads at most two items per access.• Each insert takes O(log n) time and up to log log n+O(1) moves, with high probability, and constant time in expectation.• Maintains 83.75% memory utilization, without requiring dynamic allocation during inserts.We also analyze the trade-off between the number of moves performed during inserts and the maximum load on a bucket. By performing at most h moves, we can maintain a maximum load of O(log log n/h log (log log n/h)). So, even by performing one move, we achieve a better bound than by performing no moves at all.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
A. Broder and M. Mitzenmacher Using Multiple Hash Functions to Improve IP Lookups Proceedings of IEEE INFOCOM 2001, pp. 1454--1463, 2001.
 
4
 
5
 
6
 
7
A. Czumaj, C. Riley, and C. Scheideler. Perfectly Balanced Allocation In Proceedings of the 7th International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM'03), pages 240--251.
 
8
 
9
 
10
11
12
 
13
Martin Dietzfelbinger and Christoph Weidling. Cache-friendly dictionary implementations with constant lookup time and small space overhead. Submitted to STACS 2005. Also part of Christoph Weidling's PhD thesis, to be published, Technical University of Ilmenau.
 
14
Ehud Friedgut and Gil Kalai, Every monotone graph property has a sharp threshold, Proceedings of the American Mathematical Society 124 (1996), 2993--3002.
15
 
16
 
17
M. Mitzenmacher, A. Richa, and R. Sitaraman The Power of Two Random Choices: A Survey of Techniques and Results Book chapter, in Handbook of Randomized Computing: volume 1, edited by P. Pardalos, S. Rajasekaran, and J. Rolim, pp. 255--312.
 
18
19
 
20
 
21
 
22