| Score standardization for inter-collection comparison of retrieval systems |
| Full text |
Pdf
(285 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Singapore, Singapore
SESSION: Evaluation--1
table of contents
Pages: 51-58
Year of Publication: 2008
ISBN:978-1-60558-164-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 21, Downloads (12 Months): 206, Citation Count: 10
|
|
|
ABSTRACT
The goal of system evaluation in information retrieval has always been to determine which of a set of systems is superior on a given collection. The tool used to determine system ordering is an evaluation metric such as average precision, which computes relative, collection-specific scores. We argue that a broader goal is achievable. In this paper we demonstrate that, by use of standardization, scores can be substantially independent of a particular collection, allowing systems to be compared even when they have been tested on different collections. Compared to current methods, our techniques provide richer information about system performance, improved clarity in outcome reporting, and greater simplicity in reviewing results from disparate sources.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
C. Buckley. The SMART project at TREC. In Voorhees and Harman {2005}, chapter 13.
|
| |
6
|
C. Buckley and E. Voorhees. Retrieval system evaluation. In Voorhees and Harman {2005}, chapter 3.
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
W. L. Hays. Statistics. Harcourt Brace, Fort Worth, 4th edition, 1991.
|
 |
11
|
|
 |
12
|
|
| |
13
|
G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors. Proc. 28th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Salvador, Brazil, August 2005.
|
 |
14
|
|
| |
15
|
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., to appear.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
J. Tague-Sutcliffe and J. Blustein. A statistical analysis of the TREC-3 data. In D. K. Harman, editor, Proc. TREC-3, pages 385--398, November 1994. NIST Special Publication 500-225.
|
 |
21
|
|
| |
22
|
|
| |
23
|
W. Webber, A. Moffat, and J. Zobel. Score standardization for robust comparison of retrieval systems. In M.Wu, A. Turpin, and A. Spink, editors, Proc. 12th Australasian Document Computing Symposium, pages 1--8, Melbourne, December 2007.
|
 |
24
|
|
CITED BY 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Timothy G. Armstrong , Alistair Moffat , William Webber , Justin Zobel, Has adhoc retrieval improved since 1994?, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
Chung Tong Lee , Vishwa Vinay , Eduarda Mendes Rodrigues , Gabriella Kazai , Nataša Milic-Frayling , Aleksandar Ignjatovic, Measuring system performance and topic discernment using generalized adaptive-weight mean, Proceeding of the 18th ACM conference on Information and knowledge management, November 02-06, 2009, Hong Kong, China
|
|