|
ABSTRACT
In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. Our approach is based on Locality Sensitive Hashing (LSH) which has proven very efficient in answering KNN queries in centralized settings. We consider mappings from the multi-dimensional LSH bucket space to the linearly ordered set of peers that jointly maintain the indexed data and derive requirements to achieve high quality search results and limit the number of network accesses. We put forward two such mappings that come with these salient properties: being locality preserving so that buckets likely to hold similar data are stored on the same or neighboring peers and having a predictable output distribution to ensure fair load balancing. We show how to leverage the linearly aligned data for efficient KNN search and how to efficiently process range queries which is, to the best of our knowledge, not possible in existing LSH schemes. We show by comprehensive performance evaluations using real world data that our approach brings major performance and accuracy gains compared to state-of-the-art.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
Vassilis Athitsos, Michalis Potamias, Panagiotis Papapetrou, and George Kollios. Nearest neighbor retrieval using distance-based hashing. In ICDE, 2008.
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
Ashwin R. Bharambe , Mukesh Agrawal , Srinivasan Seshan, Mercury: supporting scalable multi-attribute range queries, Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, August 30-September 03, 2004, Portland, Oregon, USA
|
| |
11
|
Erik Buchmann and Klemens Böhm. Efficient evaluation of nearest-neighbor queries in content-addressable networks. In From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments, pages 31--40, 2005.
|
| |
12
|
|
 |
13
|
Adina Crainiceanu , Prakash Linga , Ashwin Machanavajjhala , Johannes Gehrke , Jayavel Shanmugasundaram, P-ring: an efficient and robust P2P range index structure, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
[doi> 10.1145/1247480.1247507]
|
 |
14
|
Mayur Datar , Nicole Immorlica , Piotr Indyk , Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the twentieth annual symposium on Computational geometry, June 08-11, 2004, Brooklyn, New York, USA
[doi> 10.1145/997817.997857]
|
| |
15
|
|
| |
16
|
Fabrizio Falchi, Claudio Gennaro, and Pavel Zezula. A content-addressable network for similarity search in metric spaces. In DBISP2P, 2005.
|
| |
17
|
|
 |
18
|
|
| |
19
|
Parisa Haghani, Sebastian Michel, Philippe Cudré-Mauroux, and Karl Aberer. LSH at large -- distributed knn search in high dimensions. In WebDB, 2008.
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
Qin Lv , William Josephson , Zhe Wang , Moses Charikar , Kai Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
| |
24
|
Michael Ortega , Yong Rui , Kaushik Chakrabarti , Kriengkrai Porkaew , Sharad Mehrotra , Thomas S. Huang, Supporting Ranked Boolean Similarity Queries in MARS, IEEE Transactions on Knowledge and Data Engineering, v.10 n.6, p.905-925, November 1998
[doi> 10.1109/69.738357]
|
 |
25
|
|
| |
26
|
Theoni Pitoura, Nikos Ntarmos, and Peter Triantafillou. Replication, load balancing and efficient range query processing in dhts. In EDBT, 2006.
|
| |
27
|
Theoni Pitoura and Peter Triantafillou. Load distribution fairness in p2p data management systems. In ICDE, 2007.
|
 |
28
|
Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.161-172, August 2001, San Diego, California, United States
|
| |
29
|
Ozgur D. Sahin, Fatih Emekçi, Divyakant Agrawal, and Amr El Abbadi. Content-based similarity search over peer-to-peer systems. In DBISP2P, 2004.
|
 |
30
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
 |
31
|
Chunqiang Tang , Zhichen Xu , Sandhya Dwarkadas, Peer-to-peer information retrieval using self-organizing semantic overlay networks, Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, August 25-29, 2003, Karlsruhe, Germany
[doi> 10.1145/863955.863976]
|
| |
32
|
|
| |
33
|
Chi Zhang, Arvind Krishnamurthy, and Randolph Y. Wang. Skipindex: Towards a scalable peer-to-peer index service for high dimensional data. Technical report, Department of Computer Science, Princeton University, May 2004.
|
|