| Information retrieval system evaluation: effort, sensitivity, and reliability |
| Full text |
Pdf
(397 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Salvador, Brazil
SESSION: Evaluation
table of contents
Pages: 162 - 169
Year of Publication: 2005
ISBN:1-59593-034-5
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 31, Downloads (12 Months): 252, Citation Count: 56
|
|
|
ABSTRACT
The effectiveness of information retrieval systems is measured by comparing performance on a common set of queries and documents. Significance tests are often used to evaluate the reliability of such comparisons. Previous work has examined such tests, but produced results with limited application. Other work established an alternative benchmark for significance, but the resulting test was too stringent. In this paper, we revisit the question of how such tests should be used. We find that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems. Our results show that past empirical work on significance tests over-estimated the error of such tests. We also re-consider comparisons between the reliability of precision at rank 10 and mean average precision, arguing that past comparisons did not consider the assessor effort required to compute such measures. This investigation shows that assessor effort would be better spent building test collections with more topics, each assessed in less detail.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
Matthews, R. (2003) The numbers don't add up, New Scientist, March, p. 28, issue 2385.
|
| |
7
|
|
| |
8
|
Spärck Jones, K. (1974) Automatic indexing. Journal of Documentation, 30:393--432, 1974.
|
| |
9
|
Spärck Jones, K., Van Rijsbergen, C.J. (1975) Report on the need for and provision of an 'ideal' information retrieval test collection, British Library Research and Development Report 5266, University Computer Laboratory, Cambridge.
|
| |
10
|
Tague-Sutcliffe, J., Blustein (1994) A Statistical Analysis of the TREC-3 Data, in Proc. TREC-3, 385--398.
|
| |
11
|
|
 |
12
|
|
| |
13
|
Voorhees, E.M., Harman, D. (1999) Overview of the 8th Text REtrieval Conference (TREC-8), in Proc. 8th Text REtrieval Conf.
|
 |
14
|
|
CITED BY 56
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ben Carterette , Virgil Pavlu , Evangelos Kanoulas , Javed A. Aslam , James Allan, Evaluation over thousands of queries, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
Jianhan Zhu , Jun Wang , Vishwa Vinay , Ingemar J. Cox, Topic (query) selection for IR evaluation, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tanuja Bompada , Chi-Chao Chang , John Chen , Ravi Kumar , Rajesh Shenoy, On the robustness of relevance measures with incomplete judgments, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Edleno Silva de Moura , Celia Francisca dos Santos , Bruno Dos santos de Araujo , Altigran Soares da Silva , Pavel Calado , Mario A. Nascimento, Locality-Based pruning methods for web search, ACM Transactions on Information Systems (TOIS), v.26 n.2, p.1-28, March 2008
|
|
|
|
|
|
Thomas Mandl , Christa Womser-Hacker , Giorgio Di Nunzio , Nicola Ferro, How robust are multilingual information retrieval systems?, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qing Li , Yuanzhu Peter Chen , Sung-Hyon Myaeng , Yun Jin , Bo-Yeong Kang, Concept unification of terms in different languages via web mining for Information Retrieval, Information Processing and Management: an International Journal, v.45 n.2, p.246-262, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Sethuramalingam Subramaniam , Anil Kumar Singh , Pradeep Dasigi , Vasudeva Varma, Experiments in CLIR using fuzzy string search based on surface similarity, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|