ACM Home Page
Please provide us with feedback. Feedback
Reliable information retrieval evaluation with incomplete and biased judgements
Full text PdfPdf (203 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Evaluation I table of contents
Pages: 63 - 70  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Stefan Büttcher  University of Waterloo
Charles L. A. Clarke  University of Waterloo
Peter C. K. Yeung  University of Waterloo
Ian Soboroff  National Institute of Standards and Technology
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 127,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277755
What is a DOI?

ABSTRACT

Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus lead to incorrect conclusions about the performance of a particular ranking technique.

We examine the magnitude of this effect and explore how it can be countered by automatically building an unbiased set of judgements from the original, biased judgements obtained through pooling. We compare the performance of this method with other approaches to the problem of incomplete judgements, such as bpref, and show that the proposed method leads to higher evaluation accuracy, especially if the set of manual judgements is rich in documents, but highly biased against some systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
S. Böttcher, C. L. A. Clarke, and I. Soboroff. The TREC 2006 Terabyte Track. In Proceedings of TREC 2006, Gaithersburg, USA, November 2006.
 
6
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 Terabyte Track. In Proceedings of the 13th Text REtrieval Conference, Gaithersburg, USA, November 2004.
 
7
 
8
 
9
L. Grönqvist. Evaluating Latent Semantic Vector Models with Synonym Tests and Document Retrieval. In ELECTRA Workshop: Methodologies and Evaluation of Lexical Cohesion Techniques in Real-World Applications Beyond Bag of Words, pages 86--88, Salvador, Brazil, August 2005.
10
 
11
 
12
 
13
M. G. Kendall. A New Measure of Rank Correlation. Biometrika, (30):81--89, 1938.
 
14
15
16


Collaborative Colleagues:
Stefan Büttcher: colleagues
Charles L. A. Clarke: colleagues
Peter C. K. Yeung: colleagues
Ian Soboroff: colleagues