|
ABSTRACT
Several recent studies have demonstrated that the type of improvements in information retrieval system effectiveness reported in forums such as SIGIR and TREC do not translate into a benefit for users. Two of the studies used an instance recall task, and a third used a question answering task, so perhaps it is unsurprising that the precision based measures of IR system effectiveness on one-shot query evaluation do not correlate with user performance on these tasks. In this study, we evaluate two different information retrieval tasks on TREC Web-track data: a precision-based user task, measured by the length of time that users need to find a single document that is relevant to a TREC topic; and, a simple recall-based task, represented by the total number of relevant documents that users can identify within five minutes. Users employ search engines with controlled mean average precision (MAP) of between 55% and 95%. Our results show that there is no significant relationship between system effectiveness measured by MAP and the precision-based task. A significant, but weak relationship is present for the precision at one document returned metric. A weak relationship is present between MAP and the simple recall-based task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
N. Craswell, D. Hawking, R. Wilkinson, and M. Wu. Overview of the TREC 2003 web track. In The Twelfth Text REtrieval Conference (TREC 2003), pages 78--92, Gaithersburg, MD, 2003. NIST Special Publication 500--255.
|
| |
9
|
M. Elsenberg and C. Barry. Order effects: A study of the possible influence of presentation order on user judgments of document relevance. Journal of the American Society for Information Science and Technology, 39:293--301, 1988.
|
| |
10
|
D. K. Harman. The TREC test collection. In E. M. Voorhees and D. K. Harman, editors, TREC: experiment and evaluation in information retrieval. MIT Press, 2005.
|
| |
11
|
|
| |
12
|
W. Hersh and P. Over. TREC-9 interactive track report. In The Ninth Text REtrieval Conference (TREC-9), pages 41--50, Gaithersburg, MD, 2000. NIST Special Publication 500--249.
|
 |
13
|
William Hersh , Andrew Turpin , Susan Price , Benjamin Chan , Dale Kramer , Lynetta Sacherek , Daniel Olson, Do batch and user evaluations give the same results?, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.17-24, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345539]
|
| |
14
|
W. R. Hersh. Trec 2002 interactive track report. In The Eleventh Text REtrieval Conference (TREC 2002), Gaithersburg, MD, 2002. NIST Special Publication 500--251.
|
 |
15
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
CITED BY 25
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Turpin , Falk Scholer , Kalvero Jarvelin , Mingfang Wu , J. Shane Culpepper, Including summaries in system evaluation, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
Susan L. Price , Marianne Lykke Nielsen , Lois M. L. Delcambre , Peter Vedsted , Jeremy Steinhauer, Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective, Information Systems, v.34 n.8, p.778-806, December, 2009
|
|
|
|
|