|
ABSTRACT
The problem of using a broker to select a subset of available information servers in order to achieve a good trade-off between document retrieval effectiveness and cost is addressed. Server selection methods which are capable of operating in the absence of global information, and where servers have no knowledge of brokers, are investigated. A novel method using Lightweight Probe queries (LWP method) is compared with several methods based on data from past query processing, while Random and Optimal server rankings serve as controls. Methods are evaluated, using TREC data and relevance judgments, by computing ratios, both empirical and ideal, of recall and early precision for the subset versus the complete set of available servers. Estimates are also made of the best-possible performance of each of the methods. LWP and Topic Similarity methods achieved best results, each being capable of retrieving about 60% of the relevant documents for only one-third of the cost of querying all servers. Subject to the applicable cost model, the LWP method is likely to be preferred because it is suited to dynamic environments. The good results obtained with a simple automatic LWP implementation were replicated using different data and a larger set of query topics.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ALLAN, J., BALLESTEROS, L., CALLAN, J. P., CROFT, W. B., AND LU, Z. 1995. Recent experiments with INQUERY. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 49-63.
|
| |
2
|
BUCKLEY, C., SINGHAL, A., AND MITRA, M. 1996. Using query zoning and correlation within SMART. In Proceedings of the 5th Text Retrieval Conference (TREC-5, Gaithersburg, MD, Nov.), E. M. Voorhees and D. K. Harman, Eds. National Institute of Standards and Technology, Gaithersburg, MD, 105-118.
|
| |
3
|
BUCKLEY, C., SINGHAL, A., MITRA, M., AND SALTON, G. 1995. New retrieval approaches using SMART. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 25-48.
|
 |
4
|
James P. Callan , Zhihong Lu , W. Bruce Croft, Searching distributed collections with inference networks, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.21-28, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215328]
|
 |
5
|
|
| |
6
|
CLARKE, C. L. A., CORMACK, G. V., AND BURKOWSKI, F.J. 1995. Shortest substring ranking MultiText experiments for TREC-4. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 295-304.
|
| |
7
|
|
| |
8
|
DUMAIS, S.T. 1992. LSI meets TREC: A status report. In Proceedings of the 1st Text Retrieval Conference (TREC-1, Gaithersburg, MD, Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 137-152.
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
HARMAN, D. K., Ed. 1995. Proceedings of the 4th Text Retrieval Conference. (TREC-4, Washington, D.C., Nov.). National Institute of Standards and Technology, Gaithersburg, MD.
|
| |
13
|
HAWKING, D. AND BAILEY, P. 1997. Parallel document retrieval engine (PADRE) web page. Australian National University, Canberra, Australia. http://cap.anu.edu.au/cap/ projects/text_retrieval.
|
| |
14
|
HAWKING, D. AND THISTLEWAITE, P. 1995. Proximity operators--So near and yet so far. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 131-143.
|
| |
15
|
HAWKING, D. AND THISTLEWAITE, P. 1996. Relevance weighting using distance between term occurrences. Tech. Rep. TR-CS-96-08. Department of Computer Science, Australian National Univ., Canberra, Australia. Available via http://cs.anu.edu.au/techreports/1996/ index.html.
|
| |
16
|
HAWKING, D., THISTLEWAITE, P., AND BAILEY, P. 1996. ANU/ACSys TREC-5 experiments. In Proceedings of the 5th Text Retrieval Conference (TREC-5, Gaithersburg, MD, Nov.), E. M. Voorhees and D. K. Harman, Eds. National Institute of Standards and Technology, Gaithersburg, MD, 359-376.
|
| |
17
|
HORIE, T., ISHIHATA, H., SHIMIZU, T., AND IKESAKA, M. 1991. AP1000 architecture and performance of LU decomposition. In Proceedings of the 1991 International Conference on Parallel Processing. 634-635.
|
| |
18
|
KIRK, T., LEVY, A. Y., SAGIV, Y., AND SRIVASTAVA, D. 1995. The information manifold. In Papers from the AAAI Spring Symposium on Information Gathering in Distributed Heterogenous Environments (Menlo Park, CA, Mar.), C. Knoblock and A. Levy, Eds. AAAI Press, Menlo Park, CA, 85-91.
|
| |
19
|
Lu, Z., CALLAN, J. P., AND CROFT. W. B., 1996. Measures in collection ranking evaluation. Tech. Rep. TR96-39. Department of Computer Science, University of MassachuseAts, Amherst, MA.
|
 |
20
|
|
 |
21
|
Ellen M. Voorhees , Narendra K. Gupta , Ben Johnson-Laird, Learning collection fusion strategies, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.172-179, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215357]
|
| |
22
|
VOORHEES, E. M. AND HARMAN, D. K., Eds. 1996. Proceedings of the 5th Text Retrieval Conference. (TREC-5, Gaithersburg, MD, Nov.). National Institute of Standards and Technology, Gaithersburg, MD.
|
| |
23
|
|
CITED BY 46
|
|
|
|
|
|
|
|
Leah S. Larkey , Margaret E. Connell , Jamie Callan, Collection selection and results merging with topically organized U.S. patents and TREC data, Proceedings of the ninth international conference on Information and knowledge management, p.282-289, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
Allison L. Powell , James C. French , Jamie Callan , Margaret Connell , Charles L. Viles, The impact of database selection on distributed searching, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.232-239, July 24-28, 2000, Athens, Greece
|
|
|
Robert W.P. Luk , H. V. Leong , Tharam S. Dillon , Alvin T.S. Chan , W. Bruce Croft , James Allan, A survey in indexing and searching XML documents, Journal of the American Society for Information Science and Technology, v.53 n.6, p.415-437, May, 2002
|
|
|
|
|
|
Zonghuan Wu , Weiyi Meng , Clement Yu , Zhuogang Li, Towards a highly-scalable and effective metasearch engine, Proceedings of the 10th international conference on World Wide Web, p.386-395, May 01-05, 2001, Hong Kong, Hong Kong
|
|
|
James C. French , Allison L. Powell , Fredric Gey , Natalia Perelman, Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
|
|
|
|
|
|
|
|
|
Nick Craswell , Peter Bailey , David Hawking, Server selection on the World Wide Web, Proceedings of the fifth ACM conference on Digital libraries, p.37-46, June 02-07, 2000, San Antonio, Texas, United States
|
|
|
|
|
|
|
|
|
Luo Si , Rong Jin , Jamie Callan , Paul Ogilvie, A language modeling framework for resource selection and results merging, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
Luo Si , Rong Jin , Jamie Callan , Paul Ogilvie, A language modeling framework for resource selection and results merging, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nick Craswell , Francis Crimmins , David Hawking , Alistair Moffat, Performance and cost tradeoffs in Web search, Proceedings of the fifteenth Australasian database conference, p.161-169, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jack G. Conrad , Xi S. Guo , Peter Jackson , Monem Meziou, Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment, Proceedings of the 28th international conference on Very Large Data Bases, p.71-82, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|