ACM Home Page
Please provide us with feedback. Feedback
Modeling LSH for performance tuning
Full text PdfPdf (385 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB: indexing and physical query optimization table of contents
Pages 669-678  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Wei Dong  Princeton University, Princeton, NJ, USA
Zhe Wang  Princeton University, Princeton, NJ, USA
William Josephson  Princeton University, Princeton, NJ, USA
Moses Charikar  Princeton University, Princeton, NJ, USA
Kai Li  Princeton University, Princeton, NJ, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 102,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458172
What is a DOI?

ABSTRACT

Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained interesting asymptotic results, provides little guidance on how these parameters should be chosen, and tuning parameters for a given dataset remains a tedious process.

To address this problem, we present a statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH. Our model can accurately predict the average search quality and latency given a small sample dataset. Apart from automatic parameter tuning with the performance model, we also use the model to devise an adaptive LSH search algorithm to determine the probing parameter dynamically for each query. The adaptive probing method addresses the problem that even though the average performance is tuned for optimal, the variance of the performance is extremely high. We experimented with three different datasets including audio, images and 3D shapes to evaluate our methods. The results show the accuracy of the proposed model: the recall errors predicted are within 5% from the real values for most cases; the adaptive search method reduces the standard deviation of recall by about 50% over the existing method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
4
 
5
 
6
D. Dobkin and R. J. Lipton. Multidimensional searching problems. SIAM Journal on Computing, 5(2):181--186, 1976.
 
7
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren. DARPA TIMIT acoustic-phonetic continuous speech corpus, 1993.
 
8
9
10
11
 
12
13
14
 
15
 
16
 
17
18
 
19
20
 
21

Collaborative Colleagues:
Wei Dong: colleagues
Zhe Wang: colleagues
William Josephson: colleagues
Moses Charikar: colleagues
Kai Li: colleagues