ACM Home Page
Please provide us with feedback. Feedback
Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes
Full text PdfPdf (351 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
POSTER SESSION: Posters table of contents
Pages 630-631  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Mark D. Smucker  University of Waterloo, Waterloo, ON, Canada
James Allan  University of Massachusetts Amherst, Amherst, MA, USA
Ben Carterette  University of Delaware, Newark, DE, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 63,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1572050
What is a DOI?

ABSTRACT

Research has shown that little practical difference exists between the randomization, Student's paired t, and bootstrap tests of statistical significance for TREC ad-hoc retrieval experiments with 50 topics. We compared these three tests on runs with topic sizes down to 10 topics. We found that these tests show increasing disagreement as the number of topics decreases. At smaller numbers of topics, the randomization test tended to produce smaller p-values than the t-test for p-values less than 0.1. The bootstrap exhibited a systematic bias towards p-values strictly less than the t-test with this bias increasing as the number of topics decreased. We recommend the use of the randomization test although the t-test appears to be suitable even when the number of topics is small.



Collaborative Colleagues:
Mark D. Smucker: colleagues
James Allan: colleagues
Ben Carterette: colleagues