ACM Home Page
Please provide us with feedback. Feedback
The string edit distance matching problem with moves
Full text PdfPdf (1.13 MB)
Source Symposium on Discrete Algorithms archive
Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
San Francisco, California
Pages: 667 - 676  
Year of Publication: 2002
ISBN:0-89871-513-X
Authors
Graham Cormode  University of Warwick, Coventry CV4 7AL, UK
S. Muthukrishnan  AT&T Research, Florham Park, New Jersey
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 70,   Citation Count: 21
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t. A well known dynamic programming algorithm takes time O(nm) to solve this problem, and it is an important open problem in Combinatorial Pattern Matching to significantly improve this bound.We relax the problem so that (a) we allow an additional operation, namely, substring moves, and (b) we approximate the string edit distance upto a factor of O(log n log*n). Our result is a near linear time deterministic algorithm for this version of the problem. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L1 vector space using a simplified parsing technique we call Edit Sensitive Parsing (ESP). This embedding is approximately distance preserving, and we show many applications of this embedding to string proximity problems including nearest neighbors, outliers, and streaming computations with strings.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
6
 
7
{Gal85} Z. Galil. Open problems in stringology. In Combinatorial Algorithms on Words, pages 1-8. Springer, 1985.
 
8
9
 
10
11
 
12
13
14
 
15
16
 
17
{MP80} W. J. Masek and M. S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20:18-31, 1980.
18
 
19
{MSU97} K. Mehlhorn, R. Sundar, and C. Uhrig. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica, 17(2):183-198, February 1997.
 
20
{Mye86} E. W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1:251-256, 1986.
21
 
22
{SV95} S. C. Sahinalp and U. Vishkin. Data compression using locally consistent parsing, 1995.
 
23

CITED BY  21
Collaborative Colleagues:
Graham Cormode: colleagues
S. Muthukrishnan: colleagues