|
ABSTRACT
We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y' answers whether y or y' is closer to x. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if x is the a'th most similar object to y and y is the b'th most similar object to z, then x is among the D(a + b) most similar objects to z, where D is a relatively small disorder constant. Though the oracle gives much less information compared to the standard general metric space model where distance values are given, one can still design very efficient algorithms for various fundamental computational tasks. For nearest neighbor search we present deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and near-logarithmic time complexity of search. Then, for near-duplicate detection we present the first known deterministic algorithm that requires just near-linear time + time proportional to the size of output. Finally, we show that for any dataset satisfying the disorder inequality a visibility graph can be constructed: all outdegrees are near-logarithmic and greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known work-around for Navarro's impossibility of generalizing Delaunay graphs. The technical contribution of the paper consists of handling "false positives" in data structures and an algorithmic technique up-aside-down-filter.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
 |
7
|
Andrei Z. Broder , Moses Charikar , Alan M. Frieze , Michael Mitzenmacher, Min-wise independent permutations (extended abstract), Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.327-336, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276781]
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
K. L. Clarkson. Nearest neighbor queries in metric spaces. Discrete and Computational Geometry, 1999.
|
| |
13
|
K. L. Clarkson. Nearest-neighbor searching and metric space dimensions. In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, MIT Press, 2006.
|
 |
14
|
|
 |
15
|
Antonio Corral , Yannis Manolopoulos , Yannis Theodoridis , Michael Vassilakopoulos, Closest pair queries in spatial databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.189-200, May 15-18, 2000, Dallas, Texas, United States
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
Eyal Kushilevitz , Rafail Ostrovsky , Yuval Rabani, Efficient search for approximate nearest neighbor in high dimensional spaces, Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.614-623, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276877]
|
| |
27
|
Y. Lifshits and D. Nowotka. Estimation of the click volume by large scale regression analysis. In CSR'07.
|
| |
28
|
R. Macíyas and C. Segovia. Lipschitz functions on spaces of homogeneous type. Advances in Mathematics, 1979.
|
| |
29
|
|
| |
30
|
M. E. J. Newman. Detecting community structure in networks. The European Physical Journal B-Condensed Matter, 2004.
|
| |
31
|
C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment. Theory Comput. Syst., 1999.
|
 |
32
|
|
 |
33
|
Bernard Wong , Aleksandrs Slivkins , Emin Gün Sirer, Meridian: a lightweight network location service without virtual coordinates, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
| |
34
|
|
|