ACM Home Page
Please provide us with feedback. Feedback
GigaHash: scalable minimal perfect hashing for billions of urls
Full text PdfPdf (301 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
POSTER SESSION: Search table of contents
Pages: 1165 - 1166  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Kumar Chellapilla  Microsoft Live Labs
Anton Mityagin  Microsoft Live Labs
Denis Charles  Microsoft Live Labs
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 53,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242747
What is a DOI?

ABSTRACT

A minimal perfect function maps a static set of n keys on to the range of integers {0,1,2,...,n - 1}. We present a scalable high performance algorithm based on random graphs for constructing minimal perfect hash functions (MPHFs). For a set of n keys, our algorithm outputs a description of h in expected time O(n). The evaluation of h(x) requires three memory accesses for any key x and the description of h takes up 0.89n bytes (7.13n bits). This is the best (most space efficient) known result to date. Using a simple heuristic and Huffman coding, the space requirement is further reduced to 0.79n bytes (6.86n bits). We present a high performance architecture that is easy to parallelize and scales well to very large data sets encountered in internet search applications. Experimental results on a one billion URL dataset obtained from Live Search crawl data, show that the proposed algorithm (a)finds an MPHF for one billion URLs in less than 4 minutes, and (b) requires only 6.86 bits/key for the description of h.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. E. Knuth. The Art of Computer Programming: Sorting and Searching, volume 3. Addison-Wesley, 1973.
 
2
 
3
F. C. Botelho, Y. Kohayakawa, and N. Ziviani. A Practical Minimal Perfect Hashing Method. 4th Intl. Workshop on Efficient and Experimental Algorithms (WEA05), Springer-Verlag, vol. 3505, 488--500, 2005.

Collaborative Colleagues:
Kumar Chellapilla: colleagues
Anton Mityagin: colleagues
Denis Charles: colleagues