| A sublinear algorithm for weakly approximating edit distance |
| Full text |
Pdf
(299 KB)
|
| Source
|
Annual ACM Symposium on Theory of Computing
archive
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
table of contents
San Diego, CA, USA
SESSION: Session 6B
table of contents
Pages: 316 - 324
Year of Publication: 2003
ISBN:1-58113-674-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 52, Citation Count: 12
|
|
|
ABSTRACT
We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(nΑ), and "FAR" if their edit distance is Ω(n), where Α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time Õ(nmax(Α/2, 2Α - 1\)) for any fixed Α < 1. Our algorithm thus provides a trade-off between accuracy and efficiency that is particularly useful when the input data is very large.We also show a lower bound of Ω(nΑ/2) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most nΑ from those with edit distance at least n/6.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
W. Chang and E. Lawler. Approximate string matching in sublinear expected time. In Proceedings of the 31st IEEE Annual Symposium on Foundations of Computer Science, pages 116--124, Saint Louis, Missouri, 1990. IEEE Computer Society Press.
|
| |
2
|
|
| |
3
|
Graham Cormode , Mike Paterson , Süleyman Cenk Sahinalp , Uzi Vishkin, Communication complexity of document exchange, Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, p.197-206, January 09-11, 2000, San Francisco, California, United States
|
 |
4
|
|
| |
5
|
W. J. Masek and M. S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20:18--31, 1980.
|
| |
6
|
E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345--374, Oct./Nov. 1994.
|
| |
7
|
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parikshit Gopalan , T. S. Jayram , Robert Krauthgamer , Ravi Kumar, Estimating the sortedness of a data stream, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, p.318-327, January 07-09, 2007, New Orleans, Louisiana
|
|
|
|
|
|
|
|
|
|
|