ACM Home Page
Please provide us with feedback. Feedback
A dictionary for approximate string search and longest prefix search
Full text PdfPdf (222 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 15th ACM international conference on Information and knowledge management table of contents
Arlington, Virginia, USA
SESSION: Privacy, string search table of contents
Pages: 768 - 775  
Year of Publication: 2006
ISBN:1-59593-433-2
Authors
Sreenivas Gollapudi  Microsoft Search Labs
Rina Panigrahy  Stanford University
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 96,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1183614.1183723
What is a DOI?

ABSTRACT

In this paper we propose a dictionary data structure for string search with errors where the query string may didiffer from the expected matching string by a few edits. This data structure can also be used to find the database string with the longest common prefix with few errors. Specifically, with a database of n random strings, each of length of O(m), we show how to perform string search on a query string that differs from its closest match by k edits using a data structure of linear size and query time equal to Õ(log n 2 log n klog a 2m over 2m). This means that if k < m over log a 2m log n, then the query time is Õ(1). This is of significant in practice as there are several applications where k is small relative to m. Our approach converts strings into bit vectors so that similar strings can map to similar bit vectors with small hamming distance. A simple reduction can be used to obtain similar results for approximate longest prefix search.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey, 2002.
 
5
 
6
7
8
9
 
10
Adam Kirsch and Michael Mitzenmacher. Distance-sensitive bloom filters. In Proceedings of ALENEX 06, 2006.
11
 
12
 
13
M. D. McIllroy. Development of a spelling list. IEEE Transactions on Communications, 30(1):91--99, 1982.
 
14
15
16
 
17
18
 
19
E. Ukkonnen. Finding approximate pattern in strings. Journal of Algorithms, 6:132--137, 1985.

Collaborative Colleagues:
Sreenivas Gollapudi: colleagues
Rina Panigrahy: colleagues