ACM Home Page
Please provide us with feedback. Feedback
Improving collection selection with overlap awareness in P2P search engines
Full text PdfPdf (247 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
SESSION: Distributed table of contents
Pages: 67 - 74  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Matthias Bender  Max-Planck-Institut für Informatik, Saarbrücken, Germany
Sebastian Michel  Max-Planck-Institut für Informatik, Saarbrücken, Germany
Peter Triantafillou  University of Patras, Rio, Greece
Gerhard Weikum  Max-Planck-Institut für Informatik, Saarbrücken, Germany
Christian Zimmer  Max-Planck-Institut für Informatik, Saarbrücken, Germany
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076049
What is a DOI?

ABSTRACT

Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
E. Buchmann and K. Böhm. How to Run Experiments with Large Peer-to-Peer Data Structures. In Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA, Apr. 2004.
4
 
5
J. Callan. Distributed information retrieval. Advances in information retrieval, Kluwer Academic Publishers., pages 127--150, 2000.
6
 
7
 
8
F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D. Nguyen. PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. Technical Report DCS-TR-487, Rutgers University, Sept. 2002.
 
9
 
10
11
12
13
14
15
16
 
17
 
18
Z. Nie, S. Kambhampati, and T. Hernandez. Bibfinder/statminer: Effectively mining and using coverage and overlap statistics in data integration. In VLDB, pages 1097--1100, 2003.
19
20
 
21
22
23
 
24
T. Suel, C. Mathur, J. Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, and K. Shanmugasunderam. Odissea: A peer-to-peer architecture for scalable web search and information retrieval. Technical report, Polytechnic Univ., 2003.
 
25
Text REtrieval Conference (TREC). http://trec.nist.gov/.
 
26
Y. Wang, L. Galanis, and D. J. de Witt. Galanx: An efficient peer-to-peer search engine system. Available at http://www.cs.wisc.edu/~yuanwang.
27
28

CITED BY  18

Collaborative Colleagues:
Matthias Bender: colleagues
Sebastian Michel: colleagues
Peter Triantafillou: colleagues
Gerhard Weikum: colleagues
Christian Zimmer: colleagues