ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Topic set size redux
Full text PdfPdf (261 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
POSTER SESSION: Posters table of contents
Pages: 806-807  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Author
Ellen M. Voorhees  National Institute of Standards and Technology, Gaithersburg, MD, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 106,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1572138
What is a DOI?

ABSTRACT

The cost as well as the power and reliability of a retrieval test collection are all proportional to the number of topics included in it. Test collections created through community evaluations such as TREC generally use 50 topics. Prior work estimated the reliability of 50-topic sets by extrapolating confidence levels from those of smaller sets, and concluded that 50 topics are sufficient to have high confidence in a comparison, especially when the comparison is statistically significant. Using topic sets that actually contain 50 topics, this paper shows that statistically significant differences can be wrong, even when statistical significance is accompanied by moderately large (>10%) relative differences in scores. Further, using standardized evaluation scores rather than raw evaluation scores does not increase the reliability of these paired comparisons. Researchers should continue to be skeptical of conclusions demonstrated on only a single test collection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
4
5
6
7