ACM Home Page
Please provide us with feedback. Feedback
How does clickthrough data reflect retrieval quality?
Full text PdfPdf (456 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR: web search 1 table of contents
Pages 43-52  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Filip Radlinski  Cornell University, Ithaca, NY, USA
Madhu Kurup  Cornell University, Ithaca, NY, USA
Thorsten Joachims  Cornell University, Ithaca, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 291,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458092
What is a DOI?

ABSTRACT

Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (e.g., number of clicks, frequency of query reformulations, abandonment) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and we find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
K. Ali and C. Chang. On the relationship between click-rate and relevance for search engines. In Proc. of Data-Mining and Information Engineering, 2006.
 
3
J.A. Aslam, V. Pavlu, and E. Yilmaz. A sampling technique for efficiently estimating measures of query retrieval performance using incomplete judgments. In ICML Workshop on Learning with Partial ly Classified Training Data, 2005.
 
4
J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, 1996.
5
6
 
7
B. Carterette, P.N. Bennett, D.M. Chickering, and S.T. Dumais. Here or there: Preference judgements for relevance. In Proc. of ECIR 2008.
 
8
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In Proc. of NIPS 2007.
 
9
G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW Workshop on Query Log Analysis, 2007.
10
11
12
 
13
T. Joachims. Evaluating retrieval performance using clickthrough data. In J. Franke, G. Nakhaeizadeh, and I. Renz, editors, Text Mining. Physica Verlag, 2003.
14
15
 
16
J. Kozielecki. Psychological Decision Theory. Kluwer, 1981.
 
17
D. Laming. Sensory Analysis. Academic Press, 1986.
18
 
19
 
20
21
22
 
23

CITED BY  9

Collaborative Colleagues:
Filip Radlinski: colleagues
Madhu Kurup: colleagues
Thorsten Joachims: colleagues