ACM Home Page
Please provide us with feedback. Feedback
The effect of topic set size on retrieval experiment error
Full text PdfPdf (351 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Tampere, Finland
SESSION: Evaluation table of contents
Pages: 316 - 323  
Year of Publication: 2002
ISBN:1-58113-561-0
Authors
Ellen M. Voorhees  National Institute of Standards and Technology, Gaithersburg, MD
Chris Buckley  Sabir Research Inc, Gaithersburg, MD
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 74,   Citation Count: 41
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564376.564432
What is a DOI?

ABSTRACT

Retrieval mechanisms are frequently compared by computing the respective average scores for some effectiveness metric across a common set of information needs or topics, with researchers concluding one method is superior based on those averages. Since comparative retrieval system behavior is known to be highly variable across topics, good experimental design requires that a "sufficient" number of topics be used in the test. This paper uses TREC results to empirically derive error rates based on the number of topics used in a test and the observed difference in the average scores. The error rates quantify the likelihood that a different set of topics of the same size would lead to a different conclusion. We directly compute error rates for topic sets up to size 25, and extrapolate those rates for larger topic set sizes. The error rates found are larger than anticipated, indicating researchers need to take care when concluding one method is better than another, especially if few topics are used.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
C. W. Cleverdon, J. Mills, and E. M. Keen. Factors determining the performance of indexing systems. Two volumes, Cranfield, England, 1968.
 
4
 
5
M.-D. Lacasse. FUDGIT A multi-purpose data-processing and fitting program user's manual version 2.31. Technical report, Center for the Physics of Materials and Department of Physics, Montreal, Canada, April 1993.
 
6
 
7
K. Sparck Jones. Automatic indexing. Journal of Documentation, 30:393--432, 1974.
 
8
K. Sparck Jones and C. van Rijsbergen. Information retrieval test collections. Journal of Documentation, 32(1):59--75, 1976.
 
9
M. Twain. Life on the Mississippi, 1883. Reprinted at http://www.lhup.edu/~dsimanek/twain.htm.
 
10
 
11

CITED BY  41

Collaborative Colleagues:
Ellen M. Voorhees: colleagues
Chris Buckley: colleagues