|
ABSTRACT
A common approach for solving computational problems over a difficult metric space is to embed the "hard" metric into L1 which admits efficient algorithms and is thus considered an "easy" metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approximation for some metrics. We propose a new approach, of embedding the difficult space into richer host spaces, namely iterated products of standard spaces like l1 and l∞. We show that this class is rich since it contains useful metric spaces with only a constant distortion, and, at the same time, it is tractable and admits efficient algorithms. Using this approach, we obtain for example the first nearest neighbor data structure with O(log log d) approximation for edit distance in non-repetitive strings (the Ulam metric). This approximation is exponentially better than the lower bound for embedding into L1. Furthermore, we give constant factor approximation for two other computational problems. Along the way, we answer positively a question posed in [Ajtai, Jayram, Kumar, and Sivakumar, STOC 2002]. One of our algorithms has already found applications for smoothed edit distance over 0--1 strings [Andoni and Krauthgamer, ICALP 2008].
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Miklós Ajtai , T. S. Jayram , Ravi Kumar , D. Sivakumar, Approximate counting of inversions in a data stream, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
[doi> 10.1145/509907.509964]
|
 |
3
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
| |
4
|
A. Andoni. Approximate nearest neighbor problem in high dimensions. M. Eng. Thesis, Massachusetts Institute of Technology, June 2005.
|
| |
5
|
A. Andoni, K. Do Ba, and P. Indyk. Block heavy hitters. MIT Technical Report MIT-CSAIL-TR-2008-024, 2008.
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
Sanjeev Arora , Satish Rao , Umesh Vazirani, Expander flows, geometric embeddings and graph partitioning, Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, p.222-231, June 13-16, 2004, Chicago, IL, USA
[doi> 10.1145/1007352.1007355]
|
| |
12
|
|
 |
13
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
14
|
Tugkan Batu , Funda Ergün , Joe Kilian , Avner Magen , Sofya Raskhodnikova , Ronitt Rubinfeld , Rahul Sami, A sublinear algorithm for weakly approximating edit distance, Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, June 09-11, 2003, San Diego, CA, USA
[doi> 10.1145/780542.780590]
|
 |
15
|
|
| |
16
|
M. Charikar and R. Krauthgamer. Embedding the ulam metric into l1. Theory of Computing, 2(11):207--224, 2006.
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
Nikhil R. Devanur , Subhash A. Khot , Rishi Saket , Nisheeth K. Vishnoi, Integrality gaps for sparsest cut and minimum linear arrangement problems, Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, May 21-23, 2006, Seattle, WA, USA
[doi> 10.1145/1132516.1132594]
|
| |
21
|
|
| |
22
|
|
| |
23
|
Parikshit Gopalan , T. S. Jayram , Robert Krauthgamer , Ravi Kumar, Estimating the sortedness of a data stream, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, p.318-327, January 07-09, 2007, New Orleans, Louisiana
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
P. Indyk and J. Matoušek. Discrete metric spaces. CRC Handbook of Discrete and Computational Geometry, 2nd edition, 2003.
|
 |
31
|
|
| |
32
|
P. Indyk and N. Thaper. Fast color image retrieval via embeddings. Workshop on Statistical and Computational Theories of Vision (at ICCV), 2003.
|
 |
33
|
|
| |
34
|
T. Jayram and D. Woodruff. Cascaded stream aggregates. Manuscript, 2008.
|
| |
35
|
W. B. Johnson and J. Lindenstrauss, editors. Handbook of the geometry of Banach spaces. Vol. I. North-Holland Publishing Co., Amsterdam, 2001.
|
| |
36
|
|
| |
37
|
|
 |
38
|
|
| |
39
|
N. Linial. Finite metric spaces - combinatorics, geometry and algorithms. Proceedings of the International Congress of Mathematicians III, pages 573--586, 2002.
|
| |
40
|
N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215--245, 1995.
|
| |
41
|
J. I. Marden. Analyzing and Modeling Rank Data. Monographs on Statistics and Applied Probability 64. CRC Press, 1995.
|
| |
42
|
|
| |
43
|
|
 |
44
|
|
| |
45
|
|
|