ACM Home Page
Please provide us with feedback. Feedback
The impact of database selection on distributed searching
Full text PdfPdf (929 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Athens, Greece
Pages: 232 - 239  
Year of Publication: 2000
ISBN:1-58113-226-3
Authors
Allison L. Powell  Department of Computer Science, University of Virginia
James C. French  Department of Computer Science, University of Virginia
Jamie Callan  School of Computer Science, Carnegie Mellon University
Margaret Connell  Center for Intelligent Information Retrieval, University of Massachusetts
Charles L. Viles  School of Information and Library Science, University of North Carolina, Chapel Hill
Sponsors
Athens U of Econ & Business : Athens University of Economics and Business
Greek Com Soc : Greek Computer Society
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 48,   Citation Count: 43
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/345508.345584
What is a DOI?

ABSTRACT

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts — database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, J. P. Callan, W. B. Croft, L. Ballesteros, D. Byrd, R. Swan, and J. Xu. INQUERY does battle with TREC-6. In The Sixth Text REtrieval Conference (TREC-6).
 
2
 
3
J. Callan, A. L. Powell, J. C. French, and M. Connell. The Effects of Query-Based Sampling on Automatic Database Selection Algorithms. Technical Report CMU-LTI-00- 162, Language Technologies Institute, School of Computer Science, Carnegie Mellon University, 2000.
4
 
5
N. Craswell, D. Hawking, and P. Thistlewaite. Merging Results from Isolated Search Engines. In Proc. of the Tenth Australasian Database Conf., pages 189-200, 1999.
 
6
E. A. Fox, M. P. Koushik, J. Shaw, R. Modlin, and D. Rat. Combining Evidence from Multiple Searches. In The First Text Retrieval Conference (TREC-1), pages 319-328, November 1992.
7
8
 
9
J. C. French and C. L. Viles. Ensuring Retrieval Effectiveness in Distributed Digital Libraries. Journal of Visual Communication and Image Representation, 7(1):61 - 73, 1996.
10
 
11
12
13
 
14
D. Harman. Overview of the Fourth Text Retrieval Conference (TREC-4). In Proceedings of the Fourth Text Retrieval Conference (TREC-4), 1996.
15
16
 
17
A. Moffat and J. Zobel. Information Retrieval Systems for Large Document Collections. In Proceedings of the Third Text Retrieval Conference (TREC-3), pages 85-94, 1995.
 
18
R. L. Ott. An Introduction to Statistical Methods and Data Analysis. Duxbury Press, 4th. edition, 1993.
19
20
 
21
E. Voorhees, N. K. Gupta, and B. Johnson-Laird. The Collection Fusion Problem. In Proceedings of the Third Text Retrieval Conference (TREC-3), pages 95-104, 1995.
22
23
 
24
 
25

CITED BY  43

Collaborative Colleagues:
Allison L. Powell: colleagues
James C. French: colleagues
Jamie Callan: colleagues
Margaret Connell: colleagues
Charles L. Viles: colleagues