| Strategic system comparisons via targeted relevance judgments |
| Full text |
Pdf
(200 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Amsterdam, The Netherlands
SESSION: Evaluation II
table of contents
Pages: 375 - 382
Year of Publication: 2007
ISBN:978-1-59593-597-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 85, Citation Count: 7
|
|
|
ABSTRACT
Relevance judgments are used to compare text retrieval systems. Given a collection of documents and queries, and a set of systems being compared, a standard approach to forming judgments is to manually examine all documents that are highly ranked by any of the systems. However, not all of these relevance judgments provide the same benefit to the final result, particularly if the aim is to identify which systems are best, rather than to fully order them. In this paper we propose new experimental methodologies that can significantly reduce the volume of judgments required in system comparisons. Using rank-biased precision, a recently proposed effectiveness measure, we show that judging around 200 documents for each of 50 queries in a TREC-scale system evaluation containing over 100 runs is sufficient to identify the best systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
C. Buckley and E. M. Voorhees. Retrieval system evaluation. In TREC: Experiment and Evaluation in Information Retrieval, chapter 3, pages 53--75. MIT Press, Cambridge, Massachusetts, 2005.
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
 |
12
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
| |
13
|
G. Marchionini, A. Moffat, J. Tate, R. Baeza-Yates, and N. Ziviani, editor's. Proc. Twenty-Eighth Annual International ACM SIGIR Conf.on Research and Development in Information Retrieval, Salvador, Brazil, August 2005. ACM Press, New York.
|
| |
14
|
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. 2007. Under review.
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
CITED BY 7
|
|
|
|
|
Thomas Mandl , Christa Womser-Hacker , Giorgio Di Nunzio , Nicola Ferro, How robust are multilingual information retrieval systems?, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|