ACM Home Page
Please provide us with feedback. Feedback
Distributed similarity search in high dimensions using locality sensitive hashing
Full text PdfPdf (667 KB)
Source Extending Database Technology; Vol. 360 archive
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology table of contents
Saint Petersburg, Russia
SESSION: Research sessions: Multi-dimensional table of contents
Pages 744-755  
Year of Publication: 2009
ISBN:978-1-60558-422-5
Authors
Parisa Haghani  EPFL Lausanne, Switzerland
Sebastian Michel  EPFL Lausanne, Switzerland
Karl Aberer  EPFL Lausanne, Switzerland
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 128,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1516360.1516446
What is a DOI?

ABSTRACT

In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. Our approach is based on Locality Sensitive Hashing (LSH) which has proven very efficient in answering KNN queries in centralized settings. We consider mappings from the multi-dimensional LSH bucket space to the linearly ordered set of peers that jointly maintain the indexed data and derive requirements to achieve high quality search results and limit the number of network accesses. We put forward two such mappings that come with these salient properties: being locality preserving so that buckets likely to hold similar data are stored on the same or neighboring peers and having a predictable output distribution to ensure fair load balancing. We show how to leverage the linearly aligned data for efficient KNN search and how to efficiently process range queries which is, to the best of our knowledge, not possible in existing LSH schemes. We show by comprehensive performance evaluations using real world data that our approach brings major performance and accuracy gains compared to state-of-the-art.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
Vassilis Athitsos, Michalis Potamias, Panagiotis Papapetrou, and George Kollios. Nearest neighbor retrieval using distance-based hashing. In ICDE, 2008.
5
6
7
8
 
9
10
 
11
Erik Buchmann and Klemens Böhm. Efficient evaluation of nearest-neighbor queries in content-addressable networks. In From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments, pages 31--40, 2005.
 
12
13
14
 
15
 
16
Fabrizio Falchi, Claudio Gennaro, and Pavel Zezula. A content-addressable network for similarity search in metric spaces. In DBISP2P, 2005.
 
17
18
 
19
Parisa Haghani, Sebastian Michel, Philippe Cudré-Mauroux, and Karl Aberer. LSH at large -- distributed knn search in high dimensions. In WebDB, 2008.
20
 
21
22
 
23
 
24
25
 
26
Theoni Pitoura, Nikos Ntarmos, and Peter Triantafillou. Replication, load balancing and efficient range query processing in dhts. In EDBT, 2006.
 
27
Theoni Pitoura and Peter Triantafillou. Load distribution fairness in p2p data management systems. In ICDE, 2007.
28
 
29
Ozgur D. Sahin, Fatih Emekçi, Divyakant Agrawal, and Amr El Abbadi. Content-based similarity search over peer-to-peer systems. In DBISP2P, 2004.
30
31
 
32
 
33
Chi Zhang, Arvind Krishnamurthy, and Randolph Y. Wang. Skipindex: Towards a scalable peer-to-peer index service for high dimensional data. Technical report, Department of Computer Science, Princeton University, May 2004.

Collaborative Colleagues:
Parisa Haghani: colleagues
Sebastian Michel: colleagues
Karl Aberer: colleagues