| Improving collection selection with overlap awareness in P2P search engines |
| Full text |
Pdf
(247 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Salvador, Brazil
SESSION: Distributed
table of contents
Pages: 67 - 74
Year of Publication: 2005
ISBN:1-59593-034-5
|
|
Authors
|
|
Matthias Bender
|
Max-Planck-Institut für Informatik, Saarbrücken, Germany
|
|
Sebastian Michel
|
Max-Planck-Institut für Informatik, Saarbrücken, Germany
|
|
Peter Triantafillou
|
University of Patras, Rio, Greece
|
|
Gerhard Weikum
|
Max-Planck-Institut für Informatik, Saarbrücken, Germany
|
|
Christian Zimmer
|
Max-Planck-Institut für Informatik, Saarbrücken, Germany
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 18
|
|
|
ABSTRACT
Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
E. Buchmann and K. Böhm. How to Run Experiments with Large Peer-to-Peer Data Structures. In Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA, Apr. 2004.
|
 |
4
|
John Byers , Jeffrey Considine , Michael Mitzenmacher , Stanislav Rost, Informed content delivery across adaptive overlay networks, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
| |
5
|
J. Callan. Distributed information retrieval. Advances in information retrieval, Kluwer Academic Publishers., pages 127--150, 2000.
|
 |
6
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
7
|
|
| |
8
|
F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D. Nguyen. PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. Technical Report DCS-TR-487, Rutgers University, Sept. 2002.
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
Z. Nie, S. Kambhampati, and T. Hernandez. Bibfinder/statminer: Effectively mining and using coverage and overlap statistics in data integration. In VLDB, pages 1097--1100, 2003.
|
 |
19
|
|
 |
20
|
Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.161-172, August 2001, San Diego, California, United States
|
| |
21
|
|
 |
22
|
Luo Si , Rong Jin , Jamie Callan , Paul Ogilvie, A language modeling framework for resource selection and results merging, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584856]
|
 |
23
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
| |
24
|
T. Suel, C. Mathur, J. Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, and K. Shanmugasunderam. Odissea: A peer-to-peer architecture for scalable web search and information retrieval. Technical report, Polytechnic Univ., 2003.
|
| |
25
|
Text REtrieval Conference (TREC). http://trec.nist.gov/.
|
| |
26
|
Y. Wang, L. Galanis, and D. J. de Witt. Galanx: An efficient peer-to-peer search engine system. Available at http://www.cs.wisc.edu/~yuanwang.
|
 |
27
|
|
 |
28
|
|
CITED BY 18
|
|
Matthias Bender , Sebastian Michel , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, MINERVA: collaborative P2P search, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
Klaus Berberich , Manolis Koubarakis , Christos Tryfonopoulos , Gerhard Weikum , Christian Zimmer, MAPS: approximate publish/subscribe functionality in peer-to-peer networks, Proceedings of the 1st international workshop on Advanced data processing in ubiquitous computing (ADPUC 2006), November 27-December 01, 2006, Melbourne, Australia
|
|
|
|
|
|
Sebastian Michel , Matthias Bender , Nikos Ntarmos , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
Toan Luu , Fabius Klemm , Ivana Podnar , Martin Rajman , Karl Aberer, ALVIS peers: a scalable full-text peer-to-peer retrieval engine, Proceedings of the international workshop on Information retrieval in peer-to-peer networks, November 11-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
B. Barla Cambazoglu , Evren Karaca , Tayfun Kucukyilmaz , Ata Turk , Cevdet Aykanat, Architecture of a grid-enabled Web search engine, Information Processing and Management: an International Journal, v.43 n.3, p.609-623, May, 2007
|
|
|
|
|
|
|
|
|
|
|
|
Gleb Skobeltsyn , Toan Luu , Ivana Podnar Zarko , Martin Rajman , Karl Aberer, Web text retrieval with a P2P query-driven index, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
Gleb Skobeltsyn , Toan Luu , Ivana Podnar Žarko , Martin Rajman , Karl Aberer, Query-driven indexing for scalable peer-to-peer text retrieval, Proceedings of the 2nd international conference on Scalable information systems, June 06-08, 2007, Suzhou, China
|
|
|
|
|
|
Gleb Skobeltsyn , Toan Luu , Ivana Podnar arko , Martin Rajman , Karl Aberer, Query-driven indexing for scalable peer-to-peer text retrieval, Future Generation Computer Systems, v.25 n.1, p.89-99, January, 2009
|
|
|
|
|
|
|
|
|
Thomas Neumann , Matthias Bender , Sebastian Michel , Ralf Schenkel , Peter Triantafillou , Gerhard Weikum, Distributed top-k aggregation queries at large, Distributed and Parallel Databases, v.26 n.1, p.3-27, August 2009
|
|