|
ABSTRACT
In networked IR, a client submits a query to a broker, which is in
contact with a large number of databases. In order to yield a maximum number of documents at minimum cost, the broker has to make estimates about the retrieval cost of each database, and then decide for each database whether or not to use it for the current query, and if, how many documents to retrieve from it. For this purpose, we develop a general decision-theoretic model and discuss different cost structures. Besides cost for retrieving relevant versus nonrelevant documents, we consider the following parameters for each database: expected retrieval quality, expected number of relevant documents in the database and cost factors for query processing and document delivery. For computing the overall optimum, a divide-and-conquer algorithm is given. If there are several brokers knowing different databases, a preselection of brokers can only be performed heuristically, but the computation of the optimum can be done similarily to the single-broker case. In addition, we derive a formula which estimates the number of relevant documents in a database based on dictionary information.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Daniel E. Atkins , William P. Birmingham , Edmund H. Durfee , Eric J. Glover , Tracy Mullen , Elke A. Rundensteiner , Elliot Soloway , José M. Vidal , Raven Wallace , Michael P. Wellman*Authors are listed alphabeticall, Toward Inquiry-Based Education Through Interacting Software Agents, Computer, v.29 n.5, p.69-76, May 1996
[doi> 10.1109/2.494084]
|
 |
2
|
|
| |
3
|
BOOKSTEIN, A. 1983. Outline of a general probabilistic retrieval model. J. Doc. 39, 2, 63-72.
|
| |
4
|
|
 |
5
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
| |
6
|
DANZIG, P., LI, S. -H., AND OBRACZKA, K. 1992. Distributed indexing of autonomous internet services. Comput. Syst. 5, 4, 433-459.
|
| |
7
|
|
 |
8
|
|
 |
9
|
James C. French , Allison L. Powell , Charles L. Viles , Travis Emmitt , Kevin J. Prey, Evaluating database selection techniques: a testbed and experiment, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.121-129, August 24-28, 1998, Melbourne, Australia
[doi> 10.1145/290941.290976]
|
 |
10
|
|
 |
11
|
|
| |
12
|
G VERT, N. 1997. Database selection in networked information retrieval systems. Diploma thesis. Department of Computer Science, University of Dortmund, Dortmund, Germany.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
ROBERTSON, S. E. 1977. The probability ranking principle in IR. J. Doc. 33, 4, 294-304.
|
 |
18
|
|
| |
19
|
VAN RIJSBERGEN, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29, 6, 481-485.
|
 |
20
|
Ellen M. Voorhees , Narendra K. Gupta , Ben Johnson-Laird, Learning collection fusion strategies, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.172-179, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215357]
|
 |
21
|
|
 |
22
|
|
CITED BY 48
|
|
|
|
|
Allison L. Powell , James C. French , Jamie Callan , Margaret Connell , Charles L. Viles, The impact of database selection on distributed searching, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.232-239, July 24-28, 2000, Athens, Greece
|
|
|
James C. French , Allison L. Powell , Jamie Callan , Charles L. Viles , Travis Emmitt , Kevin J. Prey , Yun Mou, Comparing the performance of database selection algorithms, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.238-245, August 15-19, 1999, Berkeley, California, United States
|
|
|
James C. French , Allison L. Powell , Fredric Gey , Natalia Perelman, Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
Nick Craswell , Peter Bailey , David Hawking, Server selection on the World Wide Web, Proceedings of the fifth ACM conference on Digital libraries, p.37-46, June 02-07, 2000, San Antonio, Texas, United States
|
|
|
Jamie Callan , Fabio Crestani , Henrik Nottelmann , Pietro Pala , Xiao Mang Shou, Resource selection and data fusion in multimedia distributed digital libraries, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Matthias Bender , Sebastian Michel , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, Improving collection selection with overlap awareness in P2P search engines, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sebastian Michel , Matthias Bender , Nikos Ntarmos , Peter Triantafillou , Gerhard Weikum , Christian Zimmer, Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Jack G. Conrad , Xi S. Guo , Peter Jackson , Monem Meziou, Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment, Proceedings of the 28th international conference on Very Large Data Bases, p.71-82, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|