|
ABSTRACT
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with content management in cases of highly distributed data collections. We propose a self-organizing P2P approach in which an unstructured P2P network evolves into a super-peer architecture, with super-peers responsible for peers with similar content. Our approach is based on distributed clustering of peer contents, thus managing to create high quality clusters that span the entire network. More importantly, we show how to efficiently process similarity queries capitalizing on the newly constructed, clustered super-peer network. During query processing, the query is propagated only to few carefully selected super-peers that are able to return results of high quality. We evaluate the performance of our approach and demonstrate its advantages through simulation experiments on two document collections.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Aberer, P. Cudré-Mauroux, M. Hauswirth, and T. V. Pelt. Gridvine: Building Internet-Scale Semantic Overlay Networks. In Proceedings of ISWC'2004, 2004.
|
| |
2
|
V. Cholvi, P. Felber, and E. Biersack. Efficient search in unstructured peer-to-peer networks. Technical report, Institut EURECOM, 2003.
|
| |
3
|
A. Crespo and H. Garcia-Molina. Semantic overlay networks for P2P systems. In Proceedings of AP2PC'04, 2004.
|
| |
4
|
|
| |
5
|
|
| |
6
|
C. Doulkeridis, K. Nørvåg, and M. Vazirgiannis. DESENT: Decentralized and distributed semantic overlay generation in P2P networks. IEEE Journal on Selected Areas in Communications (J-SAC), 25(1):25--34, 2007.
|
| |
7
|
|
| |
8
|
|
 |
9
|
Claudio Gennaro , Matteo Mordacchini , Salvatore Orlando , Fausto Rabitti, Processing complex similarity queries in peer-to-peer networks, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
[doi> 10.1145/1363686.1363803]
|
| |
10
|
C. Gkantsidis, M. Mihail, and A. Saberi. Hybrid search schemes for unstructured peer-to-peer networks. In Proceedings of INFOCOM'05, 2005.
|
 |
11
|
|
| |
12
|
F. Liu, M. Li, and L. Huang. Distributed information retrieval based on hierarchical semantic overlay network. In Proceedings of GCC'04, 2004.
|
| |
13
|
|
| |
14
|
|
| |
15
|
S. Michel, P. Triantafillou, and G. Weikum. MINERVA Infinity: A Scalable Efficient Peer-to-Peer Search Engine. In Proceedings of Middleware'05, 2005.
|
 |
16
|
Wolfgang Nejdl , Martin Wolpers , Wolf Siberski , Christoph Schmitz , Mario Schlosser , Ingo Brunkhorst , Alexander Löser, Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
[doi> 10.1145/775152.775229]
|
 |
17
|
|
| |
18
|
J. X. Parreira, S. Michel, and G. Weikum. p2pDating: Real Life Inspired Semantic Overlay Networks for Web Search. In Proceedings of SIGIR'2005 HDIR Workshop, 2005.
|
 |
19
|
Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.161-172, August 2001, San Diego, California, United States
|
| |
20
|
O. D. Sahin, F. Emekci, D. Agrawal, and A. E. Abbadi. Content-based similarity search over peer-to-peer systems. In Proceedings of DBISP2P'04, 2004.
|
| |
21
|
Gleb Skobeltsyn , Toan Luu , Ivana Podnar Žarko , Martin Rajman , Karl Aberer, Query-driven indexing for scalable peer-to-peer text retrieval, Proceedings of the 2nd international conference on Scalable information systems, June 06-08, 2007, Suzhou, China
|
| |
22
|
T. Suel et al. ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval. In Proceedings of WebDB'2003, 2003.
|
| |
23
|
|
 |
24
|
Chunqiang Tang , Zhichen Xu , Sandhya Dwarkadas, Peer-to-peer information retrieval using self-organizing semantic overlay networks, Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, August 25-29, 2003, Karlsruhe, Germany
[doi> 10.1145/863955.863976]
|
 |
25
|
|
| |
26
|
|
|