| Quantifying performance and quality gains in distributed web search engines |
| Full text |
Pdf
(538 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Federated, distributed search
table of contents
Pages 411-418
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 57, Downloads (12 Months): 147, Citation Count: 0
|
|
|
ABSTRACT
Distributed search engines based on geographical partitioning of a central Web index emerge as a feasible solution to the immense growth of the Web, user bases, and query traffic. However, there is still lack of research in quantifying the performance and quality gains that can be achieved by such architectures. In this paper, we develop various cost models to evaluate the performance benefits of a geographically distributed search engine architecture based on partial index replication and query forwarding. Specifically, we focus on possible performance gains due to the distributed nature of query processing and Web crawling processes. We show that any response time gain achieved by distributed query processing can be utilized to improve search relevance as the use of complex but more accurate algorithms can now be enabled for document ranking. We also show that distributed Web crawling leads to better Web coverage and try to see if this improves the search quality. We verify the validity of our claims over large, real-life datasets via simulations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges in distributed information retrieval. In Proc. 23rd Int'l Conf. on Data Engineering, pages 6--20, 2007.
|
 |
2
|
|
| |
3
|
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Plachouras, and L. Telloli. On the feasibility of multi-site web search engines. Unpublished manuscript, 2009.
|
 |
4
|
|
| |
5
|
J. Callan. Distributed information retrieval. In Advances in Information Retrieval. Recent Research from the Center for Intelligent Information Retrieval, volume 7 of The Kluwer Int'l Series on Information Retrieval, chapter 5, pages 127--150, 2000.
|
 |
6
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
D. Harman and G. Candela. Retrieving records from a gigabyte of text on a multicomputer using statistical ranking. Journal of the American Society for Information Science, 41(8):581--589, 1990.
|
| |
11
|
B. Huffaker, M. Fomenkov, D. J. Plummer, D. Moore, and K. Claffy. Distance metrics in the internet. In Proc. Int'l Telecommunications Symposium, 2002.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
C. Tang, Z. Xu, and M. Mahalingam. Peersearch: Efficient information retrieval in peer-to-peer networks. In HotNets-I, 2002.
|
| |
16
|
|
|