|
ABSTRACT
Familiar evaluation methodologies for information retrieval (IR) are not well suited to the task of comparing systems in many real settings. These systems and evaluation methods must support contextual, interactive retrieval over changing, heterogeneous data collections, including private and confidential information.We have implemented a comparison tool which can be inserted into the natural IR process. It provides a familiar search interface, presents a small number of result sets in side-by-side panels, elicits searcher judgments, and logs interaction events. The tool permits study of real information needs as they occur, uses the documents actually available at the time of the search, and records judgments taking into account the instantaneous needs of the searcher.We have validated our proposed evaluation approach and explored potential biases by comparing different whole-of-Web search facilities using a Web-based version of the tool. In four experiments, one with supplied queries in the laboratory and three with real queries in the workplace, subjects showed no discernable left-right bias and were able to reliably distinguish between high- and low-quality result sets. We found that judgments were strongly predicted by simple implicit measures.Following validation we undertook a case study comparing two leading whole-of-Web search engines. The approach is now being used in several ongoing investigations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In Proc. VLDB, 2004.
|
| |
2
|
P. Borlund. The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Information Research, 8(3), 2003.
|
 |
3
|
|
 |
4
|
|
 |
5
|
Mark Claypool , Phong Le , Makoto Wased , David Brown, Implicit interest indicators, Proceedings of the 6th international conference on Intelligent user interfaces, p.33-40, January 14-17, 2001, Santa Fe, New Mexico, United States
[doi> 10.1145/359784.359836]
|
| |
6
|
|
 |
7
|
Susan Dumais , Edward Cutrell , JJ Cadiz , Gavin Jancke , Raman Sarin , Daniel C. Robbins, Stuff I've seen: a system for personal information retrieval and re-use, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860451]
|
| |
8
|
S. Fox. Evaluating implicit measures to improve the search experience. Talk presented at SIGIR Workshop on Implicit Measures of User Interests and Preferences, 2003.
|
| |
9
|
|
| |
10
|
P. Hansen and K. Järvelin. The information seeking and retrieval process at the Swedish Patent and Registration Office. In Proc. ACM SIGIR Workshop on Patent Retrieval, 2000.
|
| |
11
|
D. Hawking and N. Craswell. Very large scale retrieval and web search. In Voorhees and Harman {28}.
|
| |
12
|
|
| |
13
|
D. Hawking, C. Paris, R. Wilkinson, and M. Wu. Context in enterprise search and delivery. In Proc. IRiX Workshop, ACM SIGIR, 2005.
|
| |
14
|
W. Hersh and P. Over. TREC-9 interactive track report. In Proc. TREC, 2001.
|
| |
15
|
INitiative for the Evaluation of XML Retrieval. http://inex.is.informatik.uni-duisburg.de/.
|
| |
16
|
T. Joachims. Evaluating retrieval performance using clickthrough data. In Proc. SIGIR Workshop on Mathematical/Formal Methods in IR, 2002.
|
 |
17
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
| |
18
|
Kartoo, S. A. http://www.kartoo.com/.
|
 |
19
|
|
 |
20
|
|
| |
21
|
NTCIR (NII-NACSIS Test Collection for IR Systems) Project. http://research.nii.ac.jp/ntcir/.
|
| |
22
|
C. Peters, M. Braschler, J. Gonzalo, and M. Kluck, editors. Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, volume 2406 of Lecture Notes in Computer Science. Springer-Verlag, 2002.
|
 |
23
|
|
| |
24
|
|
| |
25
|
E. Selberg and O. Etzioni. Multi-service search and comparison using the MetaCrawler. In Proc. WWW4, 1995.
|
 |
26
|
|
 |
27
|
Jaime Teevan , Christine Alvarado , Mark S. Ackerman , David R. Karger, The perfect search engine is not enough: a study of orienteering behavior in directed search, Proceedings of the SIGCHI conference on Human factors in computing systems, p.415-422, April 24-29, 2004, Vienna, Austria
[doi> 10.1145/985692.985745]
|
| |
28
|
|
 |
29
|
|
CITED BY 8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Diane Kelly , Chirag Shah , Cassidy R. Sugimoto , Earl W. Bailey , Rachael A. Clemens , Ann K. Irvine , Nicholas A. Johnson , Weimao Ke , Sanghee Oh , Anezka Poljakova , Marcos A. Rodriguez , Megan G. van Noord , Yan Zhang, Effects of performance feedback on users' evaluations of an interactive IR system, Proceedings of the second international symposium on Information interaction in context, October 14-17, 2008, London, United Kingdom
|
|
|
|
|
|
|
|