|
ABSTRACT
The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t. We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 6, 1039--1051.
|
| |
2
|
Alstrup, S., Brodal, G. S., and Rauhe, T. 2000. Pattern matching in dynamic texts. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 819--828.
|
| |
3
|
Amir, A., Lewenstein, M., and Porat, E. 2000. Faster algorithms for string matching with k-mismatches. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 794--803.
|
| |
4
|
Anderson, R. J., and Miller, G. L. 1991. Deterministic parallel list ranking. Algorithmica 6, 859--868.
|
| |
5
|
Cole, R., and Hariharan, R. 1998. Approximate string matching: A simpler, faster algorithm. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 463--472.
|
| |
6
|
Cole, R., and Vishkin, U. 1986. Deterministic coin tossing and accelerating cascades: Micro and macro techniques for designing parallel algorithms. In Proceedings of the ACM Symposium on Theory of Computing. 206--219.
|
| |
7
|
Cormode, G., Paterson, M., Sahinalp, S. C., and Vishkin, U. 2000. Communication complexity of document exchange. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 197--206.
|
| |
8
|
Crochemore, M., and Rytter, W. 1994. Text Algorithms. Oxford University Press.
|
| |
9
|
Goel, A., Indyk, P., and Varadarajan, K. 2001. Reductions among high dimensional proximity problems. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 769--778.
|
| |
10
|
Goldberg, A., Plotkin, S., and Shannon, G. 1987. Parallel symmetry-breaking in sparse graphs. In Proceedings of the ACM Symposium on Theory of Computing. 315--324.
|
| |
11
|
Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York.
|
| |
12
|
Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Tech. Rep. SRC 1998-011, DEC Systems Research Centre.
|
| |
13
|
Indyk, P. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In IEEE Conference on the Foundations of Computer Science. 189--197.
|
| |
14
|
Indyk, P., and Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the ACM Symposium on Theory of Computing. 604--613.
|
| |
15
|
Karloff, H. 1993. Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48, 2, 53--60.
|
| |
16
|
Karp, R. M., Miller, R. E., and Rosenberg, A. L. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the ACM Symposium on Theory of Computing. 125--136.
|
| |
17
|
Karp, R. M., and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 2, 249--260.
|
| |
18
|
Landau, G., and Vishkin, U. 1989. Fast parallel and serial approximate string matching. J. Algorithms 10, 2, 157--169.
|
| |
19
|
Landau, G. M., and Vishkin, U. 1986. Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239--249.
|
| |
20
|
Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physic. Doklady. 10, 8, 707--710.
|
| |
21
|
Masek, W. J., and Paterson, M. S. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 18--31.
|
| |
22
|
Mehlhorn, K., Sundar, R., and Uhrig, C. 1997. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17, 2, 183--198.
|
| |
23
|
Muthukrishnan, S., Sahinalp, S. C. 2000. Approximate nearest neighbors and sequence comparison with block operations. In Proceedings of the ACM Symposium on Theory of Computing. 416--424.
|
| |
24
|
Myers, E. W. 1986. An O(N D) difference algorithm and its variations. Algorithmica 1, 251--256.
|
| |
25
|
Sahinalp, S. C., and Vishkin, U. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the ACM Symposium on Theory of Computing. 300--309.
|
| |
26
|
Sahinalp, S. C., and Vishkin, U. 1995. Data compression using locally consistent parsing. Tech. Rep., University of Maryland Department of Computer Science.
|
| |
27
|
Sahinalp, S. C., and Vishkin, U. 1996. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In IEEE Conference on the Foundations of Computer Science. 320--328.
|
| |
28
|
Sankoff, D., and Kruskal, J. B. 1983. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley.
|
| |
29
|
Shapira, D., and Storer, J. A. 2002. Edit distance with move operations. In Proceedings of the 13th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2373. 85--98.
|
|