ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Dictionary matching and indexing with errors and don't cares
Full text PdfPdf (218 KB)
Source
Annual ACM Symposium on Theory of Computing archive
Proceedings of the thirty-sixth annual ACM symposium on Theory of computing table of contents
Chicago, IL, USA
SESSION: Session 2B table of contents
Pages: 91 - 100  
Year of Publication: 2004
ISBN:1-58113-852-0
Authors
Richard Cole  New York University, NY, NY
Lee-Ad Gottlieb  New York University, NY, NY
Moshe Lewenstein  Bar-Ilan University, Ramat Gan, Israel
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 89,   Citation Count: 21
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1007352.1007374
What is a DOI?

ABSTRACT

This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c1 log n)k⁄k! + # matches), with a data structure of size O(n (c2 log n)k⁄k!) and a preprocessing time of O(n (c2 log n)k⁄k!), where c1,c2 > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
T. Akutsu. Approximate string matching with variable length don't care characters. IEICE Trans. Information and Systems, E79-D:1353--1354, 1996.
 
5
 
6
 
7
 
8
 
9
A. Amir, G. Landau, M. Lewenstein and D. Sokol. Dynamic pattern, static text matching. Submitted for journal publication; preliminary version, Proc. of Workshop on Algorithms and Data Structures, 2003, 340--352.
 
10
 
11
 
12
 
13
 
14
 
15
16
 
17
18
 
19
M. J. Fischer and M. S. Paterson. String matching and other products. Complexity of Computation, R. M. Karp (editor), SIAM-AMS Proceedings, 7:113--125, 1974.
20
 
21
D. Greene, M. Parnas and F. Yao. Multi-index hashing for information retrieval. Proc. of the Symposium on the Foundations of Computer Science, 1994, 722--731.
22
 
23
 
24
 
25
26
 
27
 
28
 
29
 
30
31
 
32
K. Mehlhorn. Data Structures and Efficient Algorithms, Vol. 1: Sorting and Searching, pp. 177--178. Springer Verlag, EATCS Monographs, 1984.
 
33
M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, Mass., 1969.
 
34
 
35
 
36
 
37
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249--260, 1995.
 
38
P. Weiner. Linear pattern matching algorithm. Proc. of the Symposium on Switching and Automata Theory, 1973, 1--11.
 
39
D. E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters, 17(2):81--84, 1983.
 
40

CITED BY  21

Collaborative Colleagues:
Richard Cole: colleagues
Lee-Ad Gottlieb: colleagues
Moshe Lewenstein: colleagues