ACM Home Page
Please provide us with feedback. Feedback
Estimation and use of uncertainty in pseudo-relevance feedback
Full text PdfPdf (332 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Formal models table of contents
Pages: 303 - 310  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Kevyn Collins-Thompson  Carnegie Mellon University
Jamie Callan  Carnegie Mellon University
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 126,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277795
What is a DOI?

ABSTRACT

Existing pseudo-relevance feedback methods typically perform averaging over the top-retrieved documents, but ignore an important statistical dimension: the risk or variance associated with either the individual document models, or their combination. Treating the baseline feedback method as a black box, and the output feedback model as a random variable, we estimate a posterior distribution for the feed-back model by resampling a given query's top-retrieved documents, using the posterior mean or mode as the enhanced feedback model. We then perform model combination over several enhanced models, each based on a slightly modified query sampled from the original query. We find that resampling documents helps increase individual feedback model precision by removing noise terms, while sampling from the query improves robustness (worst-case performance) by emphasizing terms related to multiple query aspects. The result is a meta-feedback algorithm that is both more robust and more precise than the original strong baseline method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
The Lemur toolkit for language modeling and retrieval. http://www.lemurproject.org.
 
2
G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness, and selective application of query expansion. In Proc. of the 25th European Conf. on Information Retrieval (ECIR 2004), pages 127--137.
 
3
 
4
5
 
6
K. Collins-Thompson, P. Ogilvie, and J. Callan. Initial results with structured queries and language models on half a terabyte of text. In Proc. of 2005 Text REtrieval Conference. NIST Special Publication.
 
7
8
 
9
T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen. SOMPAK: The self-organizing map program package. Technical Report A31, Helsinki University of Technology, 1996. http://www.cis.hut.fi/research/papers/som tr96.ps.Z.
 
10
11
 
12
 
13
T. Minka. Estimating a Dirichlet distribution. Technical report, 2000. http://research.microsoft.com/minka/papers/dirichlet.
 
14
J. Ponte. Advances in Information Retrieval, chapter Language models for relevance feedback, pages 73--96. 2000. W. B. Croft, ed.
15
 
16
J. Rocchio. The SMART Retrieval System, chapter Relevance Feedback in Information Retrieval, pages 313--323. Prentice-Hall, 1971. G. Salton, ed.
17
18
19
20
21


Collaborative Colleagues:
Kevyn Collins-Thompson: colleagues
Jamie Callan: colleagues