ACM Home Page
Please provide us with feedback. Feedback
Validity and power of t-test for comparing MAP and GMAP
Full text PdfPdf (109 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
POSTER SESSION: Posters table of contents
Pages: 753 - 754  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Gordon V. Cormack  University of Waterloo, Waterloo, ON, Canada
Thomas R. Lynam  University of Waterloo, Waterloo, ON, Canada
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 81,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277892
What is a DOI?

ABSTRACT

We examine the validity and power of the t-test, Wilcoxon test, and sign test in determining whether or not the difference in performance between two IR systems is significant. Empirical tests conducted on subsets of the TREC2004 Robust Retrieval collection indicate that the p-values computed by these tests for the difference in mean average precision (MAP) between two systems are very accurate fora wide range of sample sizes and significance estimates. Similarly, these tests have good power, with the t-test proving superior overall. The t-test is also valid for comparing geometric mean average precision (GMAP), exhibiting slightly superior accuracy and slightly inferior power than for MAPcomparison.




Collaborative Colleagues:
Gordon V. Cormack: colleagues
Thomas R. Lynam: colleagues