ACM Home Page
Please provide us with feedback. Feedback
Robust test collections for retrieval evaluation
Full text PdfPdf (253 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Evaluation I table of contents
Pages: 55 - 62  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Author
Ben Carterette  University of Massachusetts Amherst
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 99,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277754
What is a DOI?

ABSTRACT

Low-cost methods for acquiring relevance judgments can be a boon to researchers who need to evaluate new retrieval tasks or topics but do not have the resources to make thousands of judgments. While these judgments are very useful for a one-time evaluation, it is not clear that they can be trusted when re-used to evaluate new systems. In this work, we formally define what it means for judgments to be reusable: the confidence in an evaluation of new systems can be accurately assessed from an existing set of relevance judgments. We then present a method for augmenting a set of relevance judgments with relevance estimates that require no additional assessor effort. Using this method practically guarantees reusability: with as few as five judgments per topic taken from only two systems, we can reliably evaluate a larger set of ten systems. Even the smallest sets of judgments can be useful for evaluation of new systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
 
6
D. J. Blower. An easy derivation of logistic regression from the bayesian and maximum entropy perspective. In Proceedings of the 23rd International Workship on Bayesian Inference and Maximum Entropy Methods in Science and Engineering pages 30--43, 2004.
 
7
B. Carterette and J. Allan. Research methodology in studies of assessor effort for retrieval evaluation. In Proceedings of RIAO 2007.
8
9
 
10
11
 
12
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis Chapman & Hall/CRC, 2004.
 
13
E. T. Jaynes. Probability Theory: The Logic of Science Cambridge University Press, 2003.
 
14
 
15
I. J. Myung, S. Ramamoorti, and J. Andrew D. Baily. Maximum entropy aggregation of expert predictions. Management Science 42(10):1420--1436, October 1996.
 
16
J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. pages 61--74, 2000.
17
18
 
19
K. Sparck Jones and C. J. van Rijsbergen. Information Retrieval Test Collections. Journal of Documentation 32(1):59--75, 1976.
 
20
21