|
ABSTRACT
We have investigated two major issues in Distributed Information Retrieval (DIR), namely: collection selection and search results merging. While most published works on these two issues are based on pre-stored metadata, the approaches described in this paper involve extracting the required information at the time the query is processed. In order to predict the relevance of collections to a given query, we analyse a limited number of full documents (e.g., the top five documents) retrieved from each collection and then consider term proximity within them. On the other hand, our merging technique is rather simple since input only requires document scores and lengths of results lists. Our experiments evaluate the retrieval effectiveness of these approaches and compare them with centralised indexing and various other DIR techniques (e.g., CORI). We conducted our experiments using two testbeds: one containing news articles extracted from four different sources (2 GB) and another containing 10 GB of Web pages. Our evaluations demonstrate that the retrieval effectiveness of our simple approaches is worth considering.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
3
|
Callan J.: Distributed Information Retrieval. In W. B. Croft (Ed.), Advances in Information Retrieval. Kluwer Academic Publishers, 2000 pp. 127-150.
|
 |
4
|
|
| |
5
|
Clarke C. L. A., Cormack G. V., Burkowski F. J.: Shortest Substring Ranking (MultiText Experiments for TREC-4). Proceedings of TREC-4, 1995, pp. 295-304.
|
| |
6
|
Conover W.J.: Practical Nonparametric Statistics (2nd ed.). John Wiley & Sons, 1980, pp. 122-129.
|
 |
7
|
Nick Craswell , Peter Bailey , David Hawking, Server selection on the World Wide Web, Proceedings of the fifth ACM conference on Digital libraries, p.37-46, June 02-07, 2000, San Antonio, Texas, United States
[doi> 10.1145/336597.336628]
|
| |
8
|
Dumais S. T.: Latent Semantic Indexing (LSI) and TREC-2. Proceedings of TREC-2, 1994, pp. 105-l 15.
|
 |
9
|
James C. French , Allison L. Powell , Jamie Callan , Charles L. Viles , Travis Emmitt , Kevin J. Prey , Yun Mou, Comparing the performance of database selection algorithms, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.238-245, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312684]
|
 |
10
|
|
 |
11
|
|
| |
12
|
Kwok K. L., Gmnfeld L., Lewis D. D.: TREC-3 Ad-hoc, Routing Retrieval and Thresholding Experiments using PIRCS. Proceedings of TREC-3, 1995, pp. 247-255.
|
 |
13
|
Leah S. Larkey , Margaret E. Connell , Jamie Callan, Collection selection and results merging with topically organized U.S. patents and TREC data, Proceedings of the ninth international conference on Information and knowledge management, p.282-289, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354830]
|
| |
14
|
|
| |
15
|
|
| |
16
|
Moffat A. , Zobel J.: Information Retrieval Systems for Large Document Collections. Proceedings of TREC-3, 1995, pp. 85-94.
|
 |
17
|
Allison L. Powell , James C. French , Jamie Callan , Margaret Connell , Charles L. Viles, The impact of database selection on distributed searching, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.232-239, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345584]
|
| |
18
|
|
| |
19
|
|
| |
20
|
Savoy J., Rasolofo Y.: Report on TREC-9 Experiment: Linked-based Retrieval and Distributed Collections. Proceedings of TREC9,2000, to appear.
|
| |
21
|
Towel1 G., Voorhees E. M., Narendra K. G., Johnson-Laird B. Learning Collection Fusion Strategies for Information Retrieval. Proceedings of The Twelfth Annual Machine Learning Conference, 1995, pp. 540-548.
|
 |
22
|
Ellen M. Voorhees , Narendra K. Gupta , Ben Johnson-Laird, Learning collection fusion strategies, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.172-179, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215357]
|
 |
23
|
|
 |
24
|
|
| |
25
|
Zobel J.: Collection Selection via Lexicon Inspection. Proceedings of The Second Australian Document Computing Symposium, 1997.
|
CITED BY 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D. Lillis , F. Toolan , A. Mur , L. Peng , R. Collier , J. Dunnion, Probability-based fusion of information retrieval result sets, Artificial Intelligence Review, v.25 n.1-2, p.179-191, April 2006
|
|
|
Milad Shokouhi , Justin Zobel , Yaniv Bernstein, Distributed text retrieval from overlapping collections, Proceedings of the eighteenth conference on Australasian database, p.141-150, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|