|
ABSTRACT
This paper considers various flavors of the following online problem: preprocess a text or collection of strings, so that given a query string p, all matches of p with the text can be reported quickly. In this paper we consider matches in which a bounded number of mismatches are allowed, or in which a bounded number of "don't care" characters are allowed. The specific problems we look at are: indexing, in which there is a single text t, and we seek locations where p matches a substring of t; dictionary queries, in which a collection of strings is given upfront, and we seek those strings which match p in their entirety; and dictionary matching, in which a collection of strings is given upfront, and we seek those substrings of a (long) p which match an original string in its entirety. These are all instances of an all-to-all matching problem, for which we provide a single solution.The performance bounds all have a similar character. For example, for the indexing problem with n=|t| and m=|p|, the query time for k substitutions is O(m + (c1 log n)k⁄k! + # matches), with a data structure of size O(n (c2 log n)k⁄k!) and a preprocessing time of O(n (c2 log n)k⁄k!), where c1,c2 > 1 are constants. The deterministic preprocessing assumes a weakly nonuniform RAM model; this assumption is not needed if randomization is used in the preprocessing.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
T. Akutsu. Approximate string matching with variable length don't care characters. IEICE Trans. Information and Systems, E79-D:1353--1354, 1996.
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
Amihood Amir , Dmitry Keselman , Gad M. Landau , Moshe Lewenstein , Noa Lewenstein , Michael Rodeh, Text indexing and dictionary matching with one error, Journal of Algorithms, v.37 n.2, p.309-325, Nov. 2000
[doi> 10.1006/jagm.2000.1104]
|
| |
9
|
A. Amir, G. Landau, M. Lewenstein and D. Sokol. Dynamic pattern, static text matching. Submitted for journal publication; preliminary version, Proc. of Workshop on Algorithms and Data Structures, 2003, 340--352.
|
| |
10
|
Amihood Amir , Moshe Lewenstein , Ely Porat, Faster algorithms for string matching with k mismatches, Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, p.794-803, January 09-11, 2000, San Francisco, California, United States
|
| |
11
|
Amihood Amir , Ely Porat , Moshe Lewenstein, Approximate subset matching with Don't Cares, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, p.305-306, January 07-09, 2001, Washington, D.C., United States
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
Paolo Ferragina , S. Muthukrishnan , Mark de Berg, Multi-method dispatching: a geometric approach with applications to string matching problems, Proceedings of the thirty-first annual ACM symposium on Theory of computing, p.483-491, May 01-04, 1999, Atlanta, Georgia, United States
[doi> 10.1145/301250.301378]
|
| |
19
|
M. J. Fischer and M. S. Paterson. String matching and other products. Complexity of Computation, R. M. Karp (editor), SIAM-AMS Proceedings, 7:113--125, 1974.
|
 |
20
|
|
| |
21
|
D. Greene, M. Parnas and F. Yao. Multi-index hashing for information retrieval. Proc. of the Symposium on the Foundations of Computer Science, 1994, 722--731.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
K. Mehlhorn. Data Structures and Efficient Algorithms, Vol. 1: Sorting and Searching, pp. 177--178. Springer Verlag, EATCS Monographs, 1984.
|
| |
33
|
M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, Mass., 1969.
|
| |
34
|
|
| |
35
|
|
| |
36
|
Shashi Shekhar , Sanjay Chawla , Siva Ravada , Andrew Fetterer , Xuan Liu , Chang-tien Lu, Spatial Databases-Accomplishments and Research Needs, IEEE Transactions on Knowledge and Data Engineering, v.11 n.1, p.45-55, January 1999
[doi> 10.1109/69.755614]
|
| |
37
|
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249--260, 1995.
|
| |
38
|
P. Weiner. Linear pattern matching algorithm. Proc. of the Symposium on Switching and Automata Theory, 1973, 1--11.
|
| |
39
|
D. E. Willard. Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters, 17(2):81--84, 1983.
|
| |
40
|
|
CITED BY 21
|
|
|
|
|
Amihood Amir , Yonatan Aumann , Gary Benson , Avivit Levy , Ohad Lipsky , Ely Porat , Steven Skiena , Uzi Vishne, Pattern matching with address errors: rearrangement distances, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.1221-1229, January 22-26, 2006, Miami, Florida
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C. Iliopoulos , K. Perdikuri , E. Theodoridis , A. Tsakalidis , K. Tsichlas, Algorithms for extracting motifs from biological weighted sequences, Journal of Discrete Algorithms, v.5 n.2, p.229-242, June, 2007
|
|
|
Amihood Amir , Eran Chencinski , Costas Iliopoulos , Tsvi Kopelowitz , Hui Zhang, Property matching and weighted matching, Theoretical Computer Science, v.395 n.2-3, p.298-310, May, 2008
|
|
|
|
|
|
C. Epifanio , A. Gabriele , F. Mignosi , A. Restivo , M. Sciortino, Languages with mismatches, Theoretical Computer Science, v.385 n.1-3, p.152-166, October, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amihood Amir , Yonatan Aumann , Gary Benson , Avivit Levy , Ohad Lipsky , Ely Porat , Steven Skiena , Uzi Vishne, Pattern matching with address errors: Rearrangement distances, Journal of Computer and System Sciences, v.75 n.6, p.359-370, September, 2009
|
|
|
Yingling Liu , Xindong Wu , Xuegang Hu , Jun Gao , Gongqing Wu , Haiping Wang , Xiaoli Hong, Pattern matching with wildcards based on key character location, Proceedings of the 10th IEEE international conference on Information Reuse & Integration, p.167-170, August 10-12, 2009, Las Vegas, Nevada, USA
|
|
|
Amihood Amir , Yonatan Aumann , Piotr Indyk , Avivit Levy , Ely Porat, Efficient computations of l1 and l∞ rearrangement distances, Theoretical Computer Science, v.410 n.43, p.4382-4390, October, 2009
|
|
|
|
|
|
|
|