ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Classification-based resource selection
Full text PdfPdf (482 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 18th ACM conference on Information and knowledge management table of contents
Hong Kong, China
SESSION: IR ranking and retrieval models II table of contents
Pages: 1277-1286  
Year of Publication: 2009
ISBN:978-1-60558-512-3
Authors
Jaime Arguello  Carnegie Mellon University, Pittsburgh, PA, USA
Jamie Callan  Carnegie Mellon University, Pittsburgh, PA, USA
Fernando Diaz  Yahoo!, Montreal, PQ, Canada
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 47,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1645953.1646115
What is a DOI?

ABSTRACT

In some retrieval situations, a system must search across multiple collections. This task, referred to as federated search, occurs for example when searching a distributed index or aggregating content for web search. Resource selection refers to the subtask of deciding, given a query, which collections to search. Most existing resource selection methods rely on evidence found in collection content. We present an approach to resource selection that combines multiple sources of evidence to inform the selection decision. We derive evidence from three different sources: collection documents, the topic of the query, and query click-through data. We combine this evidence by treating resource selection as a multiclass machine learning problem. Although machine learned approaches often require large amounts of manually generated training data, we present a method for using automatically generated training data. We make use of and compare against prior resource selection work and evaluate across three experimental testbeds.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
A. Bhattacharyya. On a measure of divergence between two statistical populations defined by probability distributions. Bull. Calcutta Math. Soc., 35:99--109, 1943.
4
5
6
7
 
8
C. T. Fallen and G. B. Newby. Partitioning the gov2 corpus by internet domain name: A result-set merging experiment. In TREC 2006, 2006.
9
 
10
 
11
H. Je reys. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453--461, 1946.
12
13
14
 
15
16
17
18
 
19
M. Shokouhi. Central rank based collection selection in uncooperative distributed information retrieval. In ECIR 2007, pages 160--172. ACM, 2007.
 
20
M. Shokouhi, F. Scholer, and J. Zobel. Sample sizes for query probing in uncooperative distributed information retrieval. In APWeb 2006, pages 63--75. Springer, 2006.
21
22
23
24
25
26

Collaborative Colleagues:
Jaime Arguello: colleagues
Jamie Callan: colleagues
Fernando Diaz: colleagues