| Modeling LSH for performance tuning |
| Full text |
Pdf
(385 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: DB: indexing and physical query optimization
table of contents
Pages 669-678
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
Wei Dong
|
Princeton University, Princeton, NJ, USA
|
|
Zhe Wang
|
Princeton University, Princeton, NJ, USA
|
|
William Josephson
|
Princeton University, Princeton, NJ, USA
|
|
Moses Charikar
|
Princeton University, Princeton, NJ, USA
|
|
Kai Li
|
Princeton University, Princeton, NJ, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 102, Citation Count: 0
|
|
|
ABSTRACT
Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained interesting asymptotic results, provides little guidance on how these parameters should be chosen, and tuning parameters for a given dataset remains a tedious process. To address this problem, we present a statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH. Our model can accurately predict the average search quality and latency given a small sample dataset. Apart from automatic parameter tuning with the performance model, we also use the model to devise an adaptive LSH search algorithm to determine the probing parameter dynamically for each query. The adaptive probing method addresses the problem that even though the average performance is tuned for optimal, the variance of the performance is extremely high. We experimented with three different datasets including audio, images and 3D shapes to evaluate our methods. The results show the accuracy of the proposed model: the recall errors predicted are within 5% from the real values for most cases; the adaptive search method reduces the standard deviation of recall by about 50% over the existing method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
Mayur Datar , Nicole Immorlica , Piotr Indyk , Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the twentieth annual symposium on Computational geometry, June 08-11, 2004, Brooklyn, New York, USA
[doi> 10.1145/997817.997857]
|
| |
5
|
|
| |
6
|
D. Dobkin and R. J. Lipton. Multidimensional searching problems. SIAM Journal on Computing, 5(2):181--186, 1976.
|
| |
7
|
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren. DARPA TIMIT acoustic-phonetic continuous speech corpus, 1993.
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
Qin Lv , William Josephson , Zhe Wang , Moses Charikar , Kai Li, Ferret: a toolkit for content-based similarity search of feature-rich data, Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, April 18-21, 2006, Leuven, Belgium
|
| |
15
|
Qin Lv , William Josephson , Zhe Wang , Moses Charikar , Kai Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
Zhe Wang , Wei Dong , William Josephson , Qin Lv , Moses Charikar , Kai Li, Sizing sketches: a rank-based analysis for similarity search, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
| |
21
|
|
|