ACM Home Page
Please provide us with feedback. Feedback
Retrieval sensitivity under training using different measures
Full text PdfPdf (233 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Singapore, Singapore
SESSION: Evaluation--1 table of contents
Pages 67-74  
Year of Publication: 2008
ISBN:978-1-60558-164-4
Authors
Ben He  University of Glasgow, Glasgow, United Kngdm
Craig Macdonald  University of Glasgow, Glasgow, United Kngdm
Iadh Ounis  University of Glasgow, Glasgow, United Kngdm
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 257,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390334.1390348
What is a DOI?

ABSTRACT

Various measures, such as binary preference (bpref), inferred average precision (infAP), and binary normalised discounted cumulative gain (nDCG) have been proposed as alternatives to mean average precision (MAP) for being less sensitive to the relevance judgements completeness. As the primary aim of any system building is to train the system to respond to user queries in a more robust and stable manner, in this paper, we investigate the importance of the choice of the evaluation measure for training, under different levels of evaluation incompleteness. We simulate evaluation incompleteness by sampling from the relevance assessments. Through large-scale experiments on two standard TREC test collections, we examine retrieval sensitivity when training - i.e. if a training process, based on any of the four discussed measures has an impact on the final retrieval performance. Experimental results show that training by bpref, infAP and nDCG provides significantly better retrieval performance than training by MAP when relevance judgements completeness is extremely low. When relevance judgements completeness increases, the measures behave more similarly.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, B. Carterette, J. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Million Query TREC 2007 Overview. In Proceedings of TREC 2007.
 
2
G. Amati. Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Univ. of Glasgow, 2003.
 
3
S. Buttcher, C. Clarke and I. Soboroff. The TREC 2006 Terabyte Track. In Proceedings of TREC 2006.
4
5
6
7
8
 
9
L. Gronqvist. Evaluating Latent Semantic Vector Models with Synonym Tests and Document Retrieval. In Proceedings of SIGIR 2005 ELECTRA Workshop.
 
10
 
11
B. He and I. Ounis. Setting Per-field Normalisation Hyper-parameters for the Named-page Finding Search Task. In Proceedings of ECIR 2007.
 
12
B. He. Term Frequency Normalisation for Information Retrieval. PhD thesis, University of Glasgow, 2007.
13
14
 
15
S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.
 
16
K. Kuriyama, N, Kando, T. Nozue and K. Oyama. Pooling for a large scale test collection : Analysis of the search results for the pre-test of the NTCIR-1 Workshop. In Proceedings of NTCIR-1, 1999.
17
 
18
D. Metzler. Direct maximization of rank-based metrics. Technical report, Univ. of Massachusetts, 2005.
19
20
 
21
I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A high performance and scalable judgements retrieval platform. In Proceedings of the OSIR Workshop 2006.
 
22
S. E. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC 4. In Proceedings of TREC 4, 1995.
 
23
S. E. Robertson, S. Walker, M. Hancock-Beaulieu, A. Gull, and M. Lau. Okapi at TREC. In Proceedings of TREC-1, 1992.
24
25
 
26
K. Sparck Jones and C. van Rijsbergen. Report on the need for and provision of an "ideal" judgements retrieval test collection. British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.
27
 
28
29
30
31

Collaborative Colleagues:
Ben He: colleagues
Craig Macdonald: colleagues
Iadh Ounis: colleagues