|
ABSTRACT
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts — database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, J. P. Callan, W. B. Croft, L. Ballesteros, D. Byrd, R. Swan, and J. Xu. INQUERY does battle with TREC-6. In The Sixth Text REtrieval Conference (TREC-6).
|
| |
2
|
|
| |
3
|
J. Callan, A. L. Powell, J. C. French, and M. Connell. The Effects of Query-Based Sampling on Automatic Database Selection Algorithms. Technical Report CMU-LTI-00- 162, Language Technologies Institute, School of Computer Science, Carnegie Mellon University, 2000.
|
 |
4
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
5
|
N. Craswell, D. Hawking, and P. Thistlewaite. Merging Results from Isolated Search Engines. In Proc. of the Tenth Australasian Database Conf., pages 189-200, 1999.
|
| |
6
|
E. A. Fox, M. P. Koushik, J. Shaw, R. Modlin, and D. Rat. Combining Evidence from Multiple Searches. In The First Text Retrieval Conference (TREC-1), pages 319-328, November 1992.
|
 |
7
|
James C. French , Allison L. Powell , Jamie Callan , Charles L. Viles , Travis Emmitt , Kevin J. Prey , Yun Mou, Comparing the performance of database selection algorithms, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.238-245, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312684]
|
 |
8
|
James C. French , Allison L. Powell , Charles L. Viles , Travis Emmitt , Kevin J. Prey, Evaluating database selection techniques: a testbed and experiment, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.121-129, August 24-28, 1998, Melbourne, Australia
[doi> 10.1145/290941.290976]
|
| |
9
|
J. C. French and C. L. Viles. Ensuring Retrieval Effectiveness in Distributed Digital Libraries. Journal of Visual Communication and Image Representation, 7(1):61 - 73, 1996.
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
Luis Gravano , Héctor García-Molina , Anthony Tomasic, The effectiveness of GIOSS for the text database discovery problem, Proceedings of the 1994 ACM SIGMOD international conference on Management of data, p.126-137, May 24-27, 1994, Minneapolis, Minnesota, United States
|
| |
14
|
D. Harman. Overview of the Fourth Text Retrieval Conference (TREC-4). In Proceedings of the Fourth Text Retrieval Conference (TREC-4), 1996.
|
 |
15
|
|
 |
16
|
|
| |
17
|
A. Moffat and J. Zobel. Information Retrieval Systems for Large Document Collections. In Proceedings of the Third Text Retrieval Conference (TREC-3), pages 85-94, 1995.
|
| |
18
|
R. L. Ott. An Introduction to Statistical Methods and Data Analysis. Duxbury Press, 4th. edition, 1993.
|
 |
19
|
|
 |
20
|
Ellen M. Voorhees , Narendra K. Gupta , Ben Johnson-Laird, Learning collection fusion strategies, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.172-179, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215357]
|
| |
21
|
E. Voorhees, N. K. Gupta, and B. Johnson-Laird. The Collection Fusion Problem. In Proceedings of the Third Text Retrieval Conference (TREC-3), pages 95-104, 1995.
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
| |
25
|
|
CITED BY 43
|
|
|
|
|
|
|
|
|
|
|
James C. French , Allison L. Powell , Fredric Gey , Natalia Perelman, Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charles L. A. Clarke , Philip L. Tilker , Allen Quoc-Luan Tran , Kevin Harris , Antonio S. Cheng, A reliable storage management layer for distributed information retrieval systems, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
|
|
|
|
|
|
Leah S. Larkey , Margaret E. Connell , Jamie Callan, Collection selection and results merging with topically organized U.S. patents and TREC data, Proceedings of the ninth international conference on Information and knowledge management, p.282-289, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Lillis , F. Toolan , A. Mur , L. Peng , R. Collier , J. Dunnion, Probability-based fusion of information retrieval result sets, Artificial Intelligence Review, v.25 n.1-2, p.179-191, April 2006
|
|
|
|
|
|
|
|
|
David Lillis , Fergus Toolan , Rem Collier , John Dunnion, ProbFuse: a probabilistic approach to data fusion, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
Jack G. Conrad , Xi S. Guo , Peter Jackson , Monem Meziou, Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment, Proceedings of the 28th international conference on Very Large Data Bases, p.71-82, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|