| Why batch and user evaluations do not give the same results |
| Full text |
Pdf
(197 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
New Orleans, Louisiana, United States
Pages: 225 - 231
Year of Publication: 2001
ISBN:1-58113-331-6
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 78, Citation Count: 26
|
|
|
ABSTRACT
Much system-oriented evaluation of information retrieval systems has used the Cranfield approach based upon queries run against test collections in a batch mode. Some researchers have questioned whether this approach can be applied to the real world, but little data exists for or against that assertion. We have studied this question in the context of the TREC Interactive Track. Previous results demonstrated that improved performance as measured by relevance-based metrics in batch studies did not correspond with the results of outcomes based on real user searching tasks. The experiments in this paper analyzed those results to determine why this occurred. Our assessment showed that while the queries entered by real users into systems yielding better results in batch studies gave comparable gains in ranking of relevant documents for those users, they did not translate into better performance on specific tasks. This was most likely due to users being able to adequately find and utilize relevant documents ranked further down the output list.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Cleverdon C and Keen E, Aslib Cranfield Research Project: Factors determining the performance of indexing systems (Vol. 1: Design, Vol. 2: Results). 1966: Cranfield, UK.
|
 |
3
|
|
| |
4
|
Meadow C, Relevance? J Am Soc Info Sci, 1985. 36: 354-5.
|
| |
5
|
Swanson D, Information retrieval as a trial-and-error process. Library Quarterly, 1977. 47: 128-48.
|
| |
6
|
|
 |
7
|
William Hersh , Andrew Turpin , Susan Price , Benjamin Chan , Dale Kramer , Lynetta Sacherek , Daniel Olson, Do batch and user evaluations give the same results?, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.17-24, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345539]
|
 |
8
|
|
| |
9
|
Hersh W and Over P. TREC-8 interactive track report, in Proceedings of the 8th Text REtrieval Conference (TREC-8). 2000. Gaithersburg, MD: NIST, 57-64.
|
| |
10
|
Hersh W, et al. Further analysis of whether batch and user evaluations give the same results with a different user task, in Proceedings of the Ninth Text Retrieval Conference (TREC- 9). 2001. Gaithersburg, MD: NIST, in press.
|
| |
11
|
Hersh W and Over P. TREC-9 Interactive Track Report, in Proceedings of the Ninth Text Retrieval Conference (TREC- 9). 2001. Gaithersburg, MD: NIST, in press.
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
CITED BY 26
|
|
Dan Cosley , Shyong K. Lam , Istvan Albert , Joseph A. Konstan , John Riedl, Is seeing believing?: how recommender system interfaces affect users' opinions, Proceedings of the SIGCHI conference on Human factors in computing systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA
|
|
|
|
|
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman, Using titles and category names from editor-driven taxonomies for automatic evaluation, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
N. Sadat Shami , Y. Connie Yuan , Dan Cosley , Ling Xia , Geri Gay, That's what friends are for: facilitating 'who knows what' across group boundaries, Proceedings of the 2007 international ACM conference on Supporting group work, November 04-07, 2007, Sanibel Island, Florida, USA
|
|
|
|
|
|
|
|
|
Nina Wacholder , Diane Kelly , Paul Kantor , Robert Rittman , Ying Sun , Bing Bai , Sharon Small , Boris Yamrom , Tomek Strzalkowski, A model for quantitative evaluation of an end-to-end question-answering system, Journal of the American Society for Information Science and Technology, v.58 n.8, p.1082-1099, June 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Turpin , Falk Scholer , Kalvero Jarvelin , Mingfang Wu , J. Shane Culpepper, Including summaries in system evaluation, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
Susan L. Price , Marianne Lykke Nielsen , Lois M. L. Delcambre , Peter Vedsted , Jeremy Steinhauer, Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective, Information Systems, v.34 n.8, p.778-806, December, 2009
|
|
|
|
|