| Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces |
| Full text |
Pdf
(253 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Singapore, Singapore
SESSION: High-performance & high dimensional indexing
table of contents
Pages 123-130
Year of Publication: 2008
ISBN:978-1-60558-164-4
|
|
Authors
|
|
Wei Dong
|
Princeton University, Princeton, NJ, USA
|
|
Moses Charikar
|
Princeton University, Princeton, NJ, USA
|
|
Kai Li
|
Princeton University, Princeton, NJ, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 29, Downloads (12 Months): 220, Citation Count: 0
|
|
|
ABSTRACT
Efficient similarity search in high-dimensional spaces is important to content-based retrieval systems. Recent studies have shown that sketches can effectively approximate L1 distance in high-dimensional spaces, and that filtering with sketches can speed up similarity search by an order of magnitude. It is a challenge to further reduce the size of sketches, which are already compact, without compromising accuracy of distance estimation. This paper presents an efficient sketch algorithm for similarity search with L2 distances and a novel asymmetric distance estimation technique. Our new asymmetric estimator takes advantage of the original feature vector of the query to boost the distance estimation accuracy. We also apply this asymmetric method to existing sketches for cosine similarity and L1 distance. Evaluations with datasets extracted from images and telephone records show that our L2 sketch outperforms existing methods, and the asymmetric estimators consistently improve the accuracy of different sketch methods. To achieve the same search quality, asymmetric estimators can reduce the sketch size by 10% to 40%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
2
|
|
 |
3
|
Mayur Datar , Nicole Immorlica , Piotr Indyk , Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the twentieth annual symposium on Computational geometry, June 08-11, 2004, Brooklyn, New York, USA
[doi> 10.1145/997817.997857]
|
| |
4
|
|
| |
5
|
I. Fodor. A survey of dimension reduction techniques.
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
Qin Lv , William Josephson , Zhe Wang , Moses Charikar , Kai Li, Efficient filtering with sketches in the ferret toolkit, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
[doi> 10.1145/1178677.1178715]
|
| |
12
|
SIFT demo. http://www.cs.ubc.ca/~lowe/keypoints/.
|
| |
13
|
SWITCHBOARD-1 Release 2. http://www.ldc.upenn.edu/Catalog/docs/switchboard/.
|
| |
14
|
|
 |
15
|
Zhe Wang , Wei Dong , William Josephson , Qin Lv , Moses Charikar , Kai Li, Sizing sketches: a rank-based analysis for similarity search, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
| |
16
|
|
|