ACM Home Page
Please provide us with feedback. Feedback
The string edit distance matching problem with moves
Full text PdfPdf (188 KB)
Source
ACM Transactions on Algorithms (TALG) archive
Volume 3 ,  Issue 1  (February 2007) table of contents
SECTION: 1 - Special Issue on SODA 2002 table of contents
Article No. 2  
Year of Publication: 2007
ISSN:1549-6325
Authors
Graham Cormode  AT&T Labs, Florham Park, NJ
S. Muthukrishnan  Rutgers University, Piscataway, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 39,   Downloads (12 Months): 199,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1186810.1186812
What is a DOI?

ABSTRACT

The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t.

We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log* n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 6, 1039--1051.
 
2
Alstrup, S., Brodal, G. S., and Rauhe, T. 2000. Pattern matching in dynamic texts. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 819--828.
 
3
Amir, A., Lewenstein, M., and Porat, E. 2000. Faster algorithms for string matching with k-mismatches. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 794--803.
 
4
Anderson, R. J., and Miller, G. L. 1991. Deterministic parallel list ranking. Algorithmica 6, 859--868.
 
5
Cole, R., and Hariharan, R. 1998. Approximate string matching: A simpler, faster algorithm. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 463--472.
 
6
Cole, R., and Vishkin, U. 1986. Deterministic coin tossing and accelerating cascades: Micro and macro techniques for designing parallel algorithms. In Proceedings of the ACM Symposium on Theory of Computing. 206--219.
 
7
Cormode, G., Paterson, M., Sahinalp, S. C., and Vishkin, U. 2000. Communication complexity of document exchange. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 197--206.
 
8
Crochemore, M., and Rytter, W. 1994. Text Algorithms. Oxford University Press.
 
9
Goel, A., Indyk, P., and Varadarajan, K. 2001. Reductions among high dimensional proximity problems. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 769--778.
 
10
Goldberg, A., Plotkin, S., and Shannon, G. 1987. Parallel symmetry-breaking in sparse graphs. In Proceedings of the ACM Symposium on Theory of Computing. 315--324.
 
11
Gusfield, D. 1997. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York.
 
12
Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Tech. Rep. SRC 1998-011, DEC Systems Research Centre.
 
13
Indyk, P. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In IEEE Conference on the Foundations of Computer Science. 189--197.
 
14
Indyk, P., and Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the ACM Symposium on Theory of Computing. 604--613.
 
15
Karloff, H. 1993. Fast algorithms for approximately counting mismatches. Inf. Process. Lett. 48, 2, 53--60.
 
16
Karp, R. M., Miller, R. E., and Rosenberg, A. L. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the ACM Symposium on Theory of Computing. 125--136.
 
17
Karp, R. M., and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 2, 249--260.
 
18
Landau, G., and Vishkin, U. 1989. Fast parallel and serial approximate string matching. J. Algorithms 10, 2, 157--169.
 
19
Landau, G. M., and Vishkin, U. 1986. Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239--249.
 
20
Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physic. Doklady. 10, 8, 707--710.
 
21
Masek, W. J., and Paterson, M. S. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 18--31.
 
22
Mehlhorn, K., Sundar, R., and Uhrig, C. 1997. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica 17, 2, 183--198.
 
23
Muthukrishnan, S., Sahinalp, S. C. 2000. Approximate nearest neighbors and sequence comparison with block operations. In Proceedings of the ACM Symposium on Theory of Computing. 416--424.
 
24
Myers, E. W. 1986. An O(N D) difference algorithm and its variations. Algorithmica 1, 251--256.
 
25
Sahinalp, S. C., and Vishkin, U. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the ACM Symposium on Theory of Computing. 300--309.
 
26
Sahinalp, S. C., and Vishkin, U. 1995. Data compression using locally consistent parsing. Tech. Rep., University of Maryland Department of Computer Science.
 
27
Sahinalp, S. C., and Vishkin, U. 1996. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In IEEE Conference on the Foundations of Computer Science. 320--328.
 
28
Sankoff, D., and Kruskal, J. B. 1983. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley.
 
29
Shapira, D., and Storer, J. A. 2002. Edit distance with move operations. In Proceedings of the 13th Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2373. 85--98.