ACM Home Page
Please provide us with feedback. Feedback
String hashing for linear probing
Full text PdfPdf (442 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 655-664  
Year of Publication: 2009
Author
Mikkel Thorup  Shannon Laboratory, Florham Park, NJ
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 103,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC'07, Pagh et al. presented data sets where the standard implementation of 2-universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5-universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5-universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5-universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32-bit integers, e.g., 64-bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
J. R. Black, C. U. Martel, and H. Qi. Graph and hashing algorithms for modern architectures: Design and performance. In Proc. 2nd WAE, pages 37--48, 1998.
3
 
4
 
5
 
6
 
7
G. L. Heileman and W. Luo. How caching affects hashing. In Proc. 7th ALENEX, pages 141--154, 2005.
 
8
 
9
D. E. Knuth. Notes on "open" addressing, 1963. Unpublished memorandum. Available at http://citeseer.ist.psu.edu/knuth63notes.html.
 
10
 
11
12
 
13
R. Pagh. Personal communication, 2008.
 
14
 
15
H. Prodinger and W. Szpankowski. Preface for special issue on average case analysis of algorithms). Algorithmica, 22(4):363--365, 1998.
16
 
17
 
18
 
19
 
20
M. Wegman and J. Carter. New hash functions and their use in authentication and set equality. J. Comp. Syst. Sci., 22:265--279, 1981.