| Quality and efficiency in high dimensional nearest neighbor search |
| Full text |
Pdf
(725 KB)
|
Source
|
International Conference on Management of Data
archive
Proceedings of the 35th SIGMOD international conference on Management of data
table of contents
Providence, Rhode Island, USA
SESSION: Research session 15: nearest neighbor search
table of contents
Pages 563-576
Year of Publication: 2009
ISBN:978-1-60558-551-2
|
|
Authors
|
|
Yufei Tao
|
Chinese University of Hong Kong, New Territories, Hong Kong
|
|
Ke Yi
|
Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
|
|
Cheng Sheng
|
Chinese University of Hong Kong, New Territories, Hong Kong
|
|
Panos Kalnis
|
King Abdullah University of Science and Technology, Soudi Arabia, Saudi Arabia
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 68, Downloads (12 Months): 245, Citation Count: 0
|
|
|
ABSTRACT
Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sub-linearly with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution fulfills both requirements, except locality sensitive hashing (LSH). The existing LSH implementations are either rigorous or adhoc. Rigorous-LSH ensures good quality of query results, but requires expensive space and query cost. Although adhoc-LSH is more efficient, it abandons quality control, i.e., the neighbor it outputs can be arbitrarily bad. As a result, currently no method is able to ensure both quality and efficiency simultaneously in practice. Motivated by this, we propose a new access method called the locality sensitive B-tree (LSB-tree) that enables fast high-dimensional NN search with excellent quality. The combination of several LSB-trees leads to a structure called the LSB-forest that ensures the same result quality as rigorous-LSH, but reduces its space and query cost dramatically. The LSB-forest also outperforms adhoc-LSH, even though the latter has no quality guarantee. Besides its appealing theoretical properties, the LSB-tree itself also serves as an effective index that consumes linear space, and supports efficient updates. Our extensive experiments confirm that the LSB-tree is faster than (i) the state of the art of exact NN search by two orders of magnitude, and (ii) the best (linear-space) method of approximate retrieval by an order of magnitude, and at the same time, returns neighbors with much better quality.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles, Proceedings of the 1990 ACM SIGMOD international conference on Management of data, p.322-331, May 23-26, 1990, Atlantic City, New Jersey, United States
|
 |
6
|
Kristin P. Bennett , Usama Fayyad , Dan Geiger, Density-based indexing for approximate nearest-neighbor queries, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.233-243, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312236]
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
Markus M. Breunig , Hans-Peter Kriegel , Raymond T. Ng , Jörg Sander, LOF: identifying density-based local outliers, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.93-104, May 15-18, 2000, Dallas, Texas, United States
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Mayur Datar , Nicole Immorlica , Piotr Indyk , Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the twentieth annual symposium on Computational geometry, June 08-11, 2004, Brooklyn, New York, USA
[doi> 10.1145/997817.997857]
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Qin Lv , William Josephson , Zhe Wang , Moses Charikar , Kai Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
 |
34
|
|
 |
35
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
| |
36
|
|
|