ACM Home Page
Please provide us with feedback. Feedback
Theory of nearest neighbors indexability
Full text PdfPdf (183 KB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 31 ,  Issue 3  (September 2006) table of contents
Pages: 814 - 838  
Year of Publication: 2006
ISSN:0362-5915
Authors
Uri Shaft  Oracle USA, Redwood Shores, CA
Raghu Ramakrishnan  University of Wisconsin-Madison, Madison, WI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 99,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1166074.1166077
What is a DOI?

ABSTRACT

In this article, we consider whether traditional index structures are effective in processing unstable nearest neighbors workloads. It is known that under broad conditions, nearest neighbors workloads become unstable---distances between data points become indistinguishable from each other. We complement this earlier result by showing that if the workload for an application is unstable, you are not likely to be able to index it efficiently using (almost all known) multidimensional index structures. For a broad class of data distributions, we prove that these index structures will do no better than a linear scan of the data as dimensionality increases.Our result has implications for how experiments should be designed on index structures such as R-Trees, X-Trees, and SR-Trees: simply put, experiments trying to establish that these index structures scale with dimensionality should be designed to establish crossover points, rather than to show that the methods scale to an arbitrary number of dimensions. In other words, experiments should seek to establish the dimensionality of the dataset at which the proposed index structure deteriorates to linear scan, for each data distribution of interest; that linear scan will eventually dominate is a given.An important problem is to analytically characterize the rate at which index structures degrade with increasing dimensionality, because the dimensionality of a real data set may well be in the range that a particular method can handle. The results in this article can be regarded as a step toward solving this problem. Although we do not characterize the rate at which a structure degrades, our techniques allow us to reason directly about a broad class of index structures rather than the geometry of the nearest neighbors problem, in contrast to earlier work.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
 
6
7
8
9
 
10
 
11
12
 
13
 
14
Shaft, U. and Ramakrishnan, R. 2005. When is nearest neighbors indexable? In Proceedings of ICDT.
 
15
Smith, J. R. 1998. Query vector projection access method. In Proceedings of SPIE 3656: Storage and Retrieval for Image and Video Databases VII. 511--522.
 
16
 
17

Collaborative Colleagues:
Uri Shaft: colleagues
Raghu Ramakrishnan: colleagues