ACM Home Page
Please provide us with feedback. Feedback
Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design
Full text PdfPdf (432 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 318-326  
Year of Publication: 2009
Authors
Yury Lifshits  Yahoo! Research
Shengyu Zhang  The Chinese University of Hong Kong
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 102,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We study the so called combinatorial framework for algorithmic problems in similarity spaces. Namely, the input dataset is represented by a comparison oracle that given three points x, y, y' answers whether y or y' is closer to x. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if x is the a'th most similar object to y and y is the b'th most similar object to z, then x is among the D(a + b) most similar objects to z, where D is a relatively small disorder constant.

Though the oracle gives much less information compared to the standard general metric space model where distance values are given, one can still design very efficient algorithms for various fundamental computational tasks. For nearest neighbor search we present deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and near-logarithmic time complexity of search. Then, for near-duplicate detection we present the first known deterministic algorithm that requires just near-linear time + time proportional to the size of output. Finally, we show that for any dataset satisfying the disorder inequality a visibility graph can be constructed: all outdegrees are near-logarithmic and greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known work-around for Navarro's impossibility of generalizing Delaunay graphs.

The technical contribution of the paper consists of handling "false positives" in data structures and an algorithmic technique up-aside-down-filter.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
5
 
6
7
8
 
9
10
 
11
 
12
K. L. Clarkson. Nearest neighbor queries in metric spaces. Discrete and Computational Geometry, 1999.
 
13
K. L. Clarkson. Nearest-neighbor searching and metric space dimensions. In Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, MIT Press, 2006.
14
15
 
16
17
 
18
 
19
20
21
22
23
 
24
 
25
26
 
27
Y. Lifshits and D. Nowotka. Estimation of the click volume by large scale regression analysis. In CSR'07.
 
28
R. Macíyas and C. Segovia. Lipschitz functions on spaces of homogeneous type. Advances in Mathematics, 1979.
 
29
 
30
M. E. J. Newman. Detecting community structure in networks. The European Physical Journal B-Condensed Matter, 2004.
 
31
C. G. Plaxton, R. Rajaraman, and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment. Theory Comput. Syst., 1999.
32
33
 
34

Collaborative Colleagues:
Yury Lifshits: colleagues
Shengyu Zhang: colleagues