ACM Home Page
Please provide us with feedback. Feedback
Peer-to-peer similarity search over widely distributed document collections
Full text PdfPdf (1.74 MB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval table of contents
Napa Valley, California, USA
SESSION: Similarity search and resource selection table of contents
Pages 35-42  
Year of Publication: 2008
ISBN:978-1-60558-254-2
Authors
Christos Doulkeridis  AUEB, AThens, Greece
Kjetil Nørvåg  NTNU, Trondheim, Norway
Michalis Vazirgiannis  AUEB, Athens, Greece
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 158,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458469.1458477
What is a DOI?

ABSTRACT

This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar documents in a widely distributed document collection, as in the case of digital libraries. Peer-to-peer (P2P) systems emerge as a promising solution to delve with content management in cases of highly distributed data collections. We propose a self-organizing P2P approach in which an unstructured P2P network evolves into a super-peer architecture, with super-peers responsible for peers with similar content. Our approach is based on distributed clustering of peer contents, thus managing to create high quality clusters that span the entire network. More importantly, we show how to efficiently process similarity queries capitalizing on the newly constructed, clustered super-peer network. During query processing, the query is propagated only to few carefully selected super-peers that are able to return results of high quality. We evaluate the performance of our approach and demonstrate its advantages through simulation experiments on two document collections.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
K. Aberer, P. Cudré-Mauroux, M. Hauswirth, and T. V. Pelt. Gridvine: Building Internet-Scale Semantic Overlay Networks. In Proceedings of ISWC'2004, 2004.
 
2
V. Cholvi, P. Felber, and E. Biersack. Efficient search in unstructured peer-to-peer networks. Technical report, Institut EURECOM, 2003.
 
3
A. Crespo and H. Garcia-Molina. Semantic overlay networks for P2P systems. In Proceedings of AP2PC'04, 2004.
 
4
 
5
 
6
C. Doulkeridis, K. Nørvåg, and M. Vazirgiannis. DESENT: Decentralized and distributed semantic overlay generation in P2P networks. IEEE Journal on Selected Areas in Communications (J-SAC), 25(1):25--34, 2007.
 
7
 
8
9
 
10
C. Gkantsidis, M. Mihail, and A. Saberi. Hybrid search schemes for unstructured peer-to-peer networks. In Proceedings of INFOCOM'05, 2005.
11
 
12
F. Liu, M. Li, and L. Huang. Distributed information retrieval based on hierarchical semantic overlay network. In Proceedings of GCC'04, 2004.
 
13
 
14
 
15
S. Michel, P. Triantafillou, and G. Weikum. MINERVA Infinity: A Scalable Efficient Peer-to-Peer Search Engine. In Proceedings of Middleware'05, 2005.
16
17
 
18
J. X. Parreira, S. Michel, and G. Weikum. p2pDating: Real Life Inspired Semantic Overlay Networks for Web Search. In Proceedings of SIGIR'2005 HDIR Workshop, 2005.
19
 
20
O. D. Sahin, F. Emekci, D. Agrawal, and A. E. Abbadi. Content-based similarity search over peer-to-peer systems. In Proceedings of DBISP2P'04, 2004.
 
21
 
22
T. Suel et al. ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval. In Proceedings of WebDB'2003, 2003.
 
23
24
25
 
26

Collaborative Colleagues:
Christos Doulkeridis: colleagues
Kjetil Nørvåg: colleagues
Michalis Vazirgiannis: colleagues