ACM Home Page
Please provide us with feedback. Feedback
Why batch and user evaluations do not give the same results
Full text PdfPdf (197 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
New Orleans, Louisiana, United States
Pages: 225 - 231  
Year of Publication: 2001
ISBN:1-58113-331-6
Authors
Andrew H. Turpin  Curtin Univ. of Technology, Perth, WA, Australia
William Hersh  Oregon Health Sciences Univ., Portland, OR
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 78,   Citation Count: 26
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/383952.383992
What is a DOI?

ABSTRACT

Much system-oriented evaluation of information retrieval systems has used the Cranfield approach based upon queries run against test collections in a batch mode. Some researchers have questioned whether this approach can be applied to the real world, but little data exists for or against that assertion. We have studied this question in the context of the TREC Interactive Track. Previous results demonstrated that improved performance as measured by relevance-based metrics in batch studies did not correspond with the results of outcomes based on real user searching tasks. The experiments in this paper analyzed those results to determine why this occurred. Our assessment showed that while the queries entered by real users into systems yielding better results in batch studies gave comparable gains in ranking of relevant documents for those users, they did not translate into better performance on specific tasks. This was most likely due to users being able to adequately find and utilize relevant documents ranked further down the output list.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Cleverdon C and Keen E, Aslib Cranfield Research Project: Factors determining the performance of indexing systems (Vol. 1: Design, Vol. 2: Results). 1966: Cranfield, UK.
3
 
4
Meadow C, Relevance? J Am Soc Info Sci, 1985. 36: 354-5.
 
5
Swanson D, Information retrieval as a trial-and-error process. Library Quarterly, 1977. 47: 128-48.
 
6
7
8
 
9
Hersh W and Over P. TREC-8 interactive track report, in Proceedings of the 8th Text REtrieval Conference (TREC-8). 2000. Gaithersburg, MD: NIST, 57-64.
 
10
Hersh W, et al. Further analysis of whether batch and user evaluations give the same results with a different user task, in Proceedings of the Ninth Text Retrieval Conference (TREC- 9). 2001. Gaithersburg, MD: NIST, in press.
 
11
Hersh W and Over P. TREC-9 Interactive Track Report, in Proceedings of the Ninth Text Retrieval Conference (TREC- 9). 2001. Gaithersburg, MD: NIST, in press.
 
12
 
13
14

CITED BY  26

Collaborative Colleagues:
Andrew H. Turpin: colleagues
William Hersh: colleagues