| A dictionary for approximate string search and longest prefix search |
| Full text |
Pdf
(222 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the 15th ACM international conference on Information and knowledge management
table of contents
Arlington, Virginia, USA
SESSION: Privacy, string search
table of contents
Pages: 768 - 775
Year of Publication: 2006
ISBN:1-59593-433-2
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 96, Citation Count: 0
|
|
|
ABSTRACT
In this paper we propose a dictionary data structure for string search with errors where the query string may didiffer from the expected matching string by a few edits. This data structure can also be used to find the database string with the longest common prefix with few errors. Specifically, with a database of n random strings, each of length of O(m), we show how to perform string search on a query string that differs from its closest match by k edits using a data structure of linear size and query time equal to Õ(log n 2 log n klog a 2m over 2m). This means that if k < m over log a 2m log n, then the query time is Õ(1). This is of significant in practice as there are several applications where k is small relative to m. Our approach converts strings into bit vectors so that similar strings can map to similar bit vectors with small hamming distance. A simple reduction can be used to obtain similar results for approximate longest prefix search.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey, 2002.
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
Adam Kirsch and Michael Mitzenmacher. Distance-sensitive bloom filters. In Proceedings of ALENEX 06, 2006.
|
 |
11
|
Eyal Kushilevitz , Rafail Ostrovsky , Yuval Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces, Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.614-623, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276877]
|
| |
12
|
|
| |
13
|
M. D. McIllroy. Development of a spelling list. IEEE Transactions on Communications, 30(1):91--99, 1982.
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
19
|
E. Ukkonnen. Finding approximate pattern in strings. Journal of Algorithms, 6:132--137, 1985.
|
|