ACM Home Page
Please provide us with feedback. Feedback
A sublinear algorithm for weakly approximating edit distance
Full text PdfPdf (299 KB)
Source Annual ACM Symposium on Theory of Computing archive
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing table of contents
San Diego, CA, USA
SESSION: Session 6B table of contents
Pages: 316 - 324  
Year of Publication: 2003
ISBN:1-58113-674-9
Authors
Tugkan Batu  University of Pennsylvania, Philadelphia, PA
Funda Ergün  Case Western Reserve University
Joe Kilian  NEC Laboratories America
Avner Magen  University of Toronto, Toronto, ON, CANADA
Sofya Raskhodnikova  MIT, Cambridge, MA
Ronitt Rubinfeld  NEC Laboratories America
Rahul Sami  Yale University, New Haven, CT
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 52,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/780542.780590
What is a DOI?

ABSTRACT

We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(nΑ), and "FAR" if their edit distance is Ω(n), where Α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time Õ(nmax(Α/2, 2Α - 1\)) for any fixed Α < 1. Our algorithm thus provides a trade-off between accuracy and efficiency that is particularly useful when the input data is very large.We also show a lower bound of Ω(nΑ/2) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most nΑ from those with edit distance at least n/6.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
W. Chang and E. Lawler. Approximate string matching in sublinear expected time. In Proceedings of the 31st IEEE Annual Symposium on Foundations of Computer Science, pages 116--124, Saint Louis, Missouri, 1990. IEEE Computer Society Press.
 
2
 
3
4
 
5
W. J. Masek and M. S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20:18--31, 1980.
 
6
E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345--374, Oct./Nov. 1994.
 
7

CITED BY  12

Collaborative Colleagues:
Tugkan Batu: colleagues
Funda Ergün: colleagues
Joe Kilian: colleagues
Avner Magen: colleagues
Sofya Raskhodnikova: colleagues
Ronitt Rubinfeld: colleagues
Rahul Sami: colleagues