ACM Home Page
Please provide us with feedback. Feedback
How reliable are the results of large-scale information retrieval experiments?
Full text PdfPdf (1.16 MB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Melbourne, Australia
Pages: 307 - 314  
Year of Publication: 1998
ISBN:1-58113-015-5
Author
Justin Zobel  Department of Computer Science, RMIT, GPO Box, 2476V, Melbourne 3001, Australia
Sponsors
University of Melbourne : University of Melbourne
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 27,   Downloads (12 Months): 120,   Citation Count: 90
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/290941.291014
What is a DOI?

ABSTRACT

Two stages in measurement of techniques for information retrieval are gathering of documents for relevance assessment and use of the assessments to numerically evaluate effectiveness. We consider both of these stages in the context of the TREC experiments, to determine whether they lead to measurements that are trustworthy and fair. Our detailed empirical investigation of the TREC results shows that the measured relative performance of systems appears to be reliable, but that recall is overestimated: it is likely that many relevant documents have not been found. We propose a new pooling strategy that can significantly in- crease the number of relevant documents found for given effort, without compromising fairness.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
D. Harman. Overview of the fourth text retrieval conference (TREC-4). In D. Harman, editor, Proc. Text Retrieval Conference (TREC), October 1995.
 
3
 
4
S.P. Hatter. The Cranfield II relevance assessments: A critical evaluation. Library Quarterly, 41:229-243, 1971.
 
5
 
6
C. Howson and P. Urbach. Scientific Reasoning: The Bayesian Approach, second edition. Open Court, Chicago Illinois, 1993.
 
7
M.E. Lesk and G. Salton. Relevance assessments and retrieval system evaluation. Information Storage and Retrieval, 4(4):343-359, 1969.
8
 
9
 
10
 
11
D.R. Swanson. Some unexplained aspects of the Cranfield tests of indexing performance factors. Library Quarterly, 41:223-228, 1971.
 
12
 
13
J. Tague-Sutcliffe and J. Blustein. A statistical analysis of the TREC-3 data. In D. Harman, editor, Proc. Text Retrieval Conference (TREC), pages 385-398, 1994.
 
14
E. Voorhees and D. Harman. Overview of the fifth text retrieval conference (TREC-5). In E. Voorhees and D. Harman, editors, Proc. Text Retrieval Conference (TREC), November 1996.
 
15
 
16

CITED BY  90