ACM Home Page
Please provide us with feedback. Feedback
LSH forest: self-tuning indexes for similarity search
Full text PdfPdf (248 KB)
Source International World Wide Web Conference archive
Proceedings of the 14th international conference on World Wide Web table of contents
Chiba, Japan
SESSION: Link-based similarity table of contents
Pages: 651 - 660  
Year of Publication: 2005
ISBN:1-59593-046-9
Authors
Mayank Bawa  Stanford University, Stanford, CA
Tyson Condie  U. C. Berkeley, Berkeley, CA
Prasanna Ganesan  Stanford University, Stanford, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 95,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1060745.1060840
What is a DOI?

ABSTRACT

We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of settings: Web search engines desire fast, parallel, main-memory-based indexes for similarity search on text data; database systems desire disk-based similarity indexes for high-dimensional data, including text and images; peer-to-peer systems desire distributed similarity indexes with low communication cost. We propose an indexing scheme called LSH Forest which is applicable in all the above contexts. Our index uses the well-known technique of locality-sensitive hashing (LSH), but improves upon previous designs by (a) eliminating the different data-dependent parameters for which LSH must be constantly hand-tuned, and (b) improving on LSH's performance guarantees for skewed data distributions while retaining the same storage and query overhead. We show how to construct this index in main memory, on disk, in parallel systems, and in peer-to-peer systems. We evaluate the design with experiments on multiple text corpora and demonstrate both the self-tuning nature and the superior performance of LSH Forest.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
K. Aberer. Scalable data access in p2p systems using unbalanced search trees. In Proc. WDAS, 2002.
2
3
 
4
 
5
6
 
7
 
8
 
9
10
 
11
 
12
13
 
14
15
 
16
17
 
18
P. Ganesan, M. Bawa, and H. Garcia-Molina. Online balancing of range-partitioned data with applications to peer-to-peer systems. In Proc. VLDB, 2004.
19
 
20
 
21
A. Gutman. R-trees: A dynamic index structure for spatial searching. In Proc. of SIGMOD, 1997.
22
23
 
24
N. Katayama and S. Satoh. The r*-tree: An efficient and robust access method for points and rectangles. In Proc. of SIGMOD, 1997.
 
25
 
26
27
28
29
30
31
 
32
 
33

CITED BY  14

Collaborative Colleagues:
Mayank Bawa: colleagues
Tyson Condie: colleagues
Prasanna Ganesan: colleagues