ACM Home Page
Please provide us with feedback. Feedback
Risky business: modeling and exploiting uncertainty in information retrieval
Full text PdfPdf (690 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Retrieval models I table of contents
Pages 99-106  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Jianhan Zhu  University College London, London, United Kingdom
Jun Wang  University College London, London, United Kingdom
Ingemar J. Cox  University College London, London, United Kingdom
Michael J. Taylor  Microsoft Research, Cambridge, United Kingdom
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 78,   Downloads (12 Months): 245,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571961
What is a DOI?

ABSTRACT

Most retrieval models estimate the relevance of each document to a query and rank the documents accordingly. However, such an approach ignores the uncertainty associated with the estimates of relevancy. If a high estimate of relevancy also has a high uncertainty, then the document may be very relevant or not relevant at all. Another document may have a slightly lower estimate of relevancy but the corresponding uncertainty may be much less. In such a circumstance, should the retrieval engine risk ranking the first document highest, or should it choose a more conservative (safer) strategy that gives preference to the second document? There is no definitive answer to this question, as it depends on the risk preferences of the user and the information retrieval system. In this paper we present a general framework for modeling uncertainty and introduce an asymmetric loss function with a single parameter that can model the level of risk the system is willing to accept. By adjusting the risk preference parameter, our approach can effectively adapt to users' different retrieval strategies.

We apply this asymmetric loss function to a language modeling framework and a practical risk-aware document scoring function is obtained. Our experiments on several TREC collections show that our "risk-averse" approach significantly improves the Jelinek-Mercer smoothing language model, and a combination of our "risk-averse" approach and the Jelinek-Mercer smoothing method generally outperforms the Dirichlet smoothing method. Experimental results also show that the "risk-averse" approach, even without smoothing from the collection statistics, performs as well as three commonly-adopted retrieval models, namely, the Jelinek-Mercer and Dirichlet smoothing methods, and BM25 model.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. In Proc. of NIPS, pages 601--608, 2001.
3
 
4
K. Church and W. Gale. Poisson mixtures. Journal of Natural Language Engineering, 1995.
 
5
A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin. Bayesian Data Analysis. Chapman and Hall, 2003.
 
6
D. Hiemstra. Using language models for information retrieval. Doctoral thesis, University of Twente, 2001.
7
 
8
F. Jelinek and R. Mercer. Interpolated estimation of markov source parameters from sparse data. Pattern Recognition in Practice, pages 381--402, 1980.
 
9
M. Kendall and A. Stuart, editors. The Advanced Theory of Statistics Volume 1, 3rd Edition (Section 3.12). Griffin, London, 1969.
10
 
11
E. Lukacs, editor. Characteristic Functions, 2nd Edition (Page 27). Griffin, London, 1970.
12
13
14
 
15
 
16
 
17
S.E. Robertson, M.E. Maron, and W. Cooper. Probability of relevance: a unification of two competing models for document retrieval. Information Technology: Research and Development, 1(1):1--21, 1982.
 
18
S.E. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.
 
19
 
20
S.E. Robertson, S. Walker, M. Hancock-Beaulieu, M. Gatford, and A. Payne. Okapi at trec-4. In Text REtrieval Conference (TREC), 1995.
 
21
J.A. Thom and F. Scholer. A comparison of evaluation measures given how users perform on search tasks. In Australasian Document Computing Symposium, pages 100--103, 2007.
 
22
23
24
 
25
A. Zellner. Bayesian estimation and prediction using asymmetric loss functions. Journal of the American Statistical Association, 81(394):446--451, 1986.
26
 
27
 
28


Collaborative Colleagues:
Jianhan Zhu: colleagues
Jun Wang: colleagues
Ingemar J. Cox: colleagues
Michael J. Taylor: colleagues