ACM Home Page
Please provide us with feedback. Feedback
Are click-through data adequate for learning web search rankings?
Full text PdfPdf (229 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR: web search 1 table of contents
Pages 73-82  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Zhicheng Dou  Microsoft Research Asia, Beijing, China
Ruihua Song  Microsoft Research Asia, Beijing, China
Xiaojie Yuan  Nankai University, Tianjin, China
Ji-Rong Wen  Microsoft Research Asia, Beijing, China
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 37,   Downloads (12 Months): 185,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458095
What is a DOI?

ABSTRACT

Learning-to-rank algorithms, which can automatically adapt ranking functions in web search, require a large volume of training data. A traditional way of generating training examples is to employ human experts to judge the relevance of documents. Unfortunately, it is difficult, time-consuming and costly. In this paper, we study the problem of exploiting click-through data for learning web search rankings that can be collected at much lower cost. We extract pairwise relevance preferences from a large-scale aggregated click-through dataset, compare these preferences with explicit human judgments, and use them as training examples to learn ranking functions. We find click-through data are useful and effective in learning ranking functions. A straightforward use of aggregated click-through data can outperform human judgments. We demonstrate that the strategies are only slightly affected by fraudulent clicks. We also reveal that the pairs which are very reliable, e.g., the pairs consisting of documents with large click frequency differences, are not sufficient for learning.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
C.J.C. Burges, R. Ragno, and Q.V. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 18, pages 395--402, Cambridge, MA, 2006. MIT Press.
6
7
8
 
9
10
11
12
13
14
 
15
M. Kendall and B.B. Smith. Randomness and random sampling numbers. Journal of the Royal Statistical Society, 101(1):147--166, 1938.
 
16
T.-Y. Liu, T. Qin, J. Xu, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In LR4IR 2007 in conjunction with SIGIR 2007, 2007.
 
17
{17} F. Radlinski and T. Joachims. Evaluating the robustness of learning from implicit feedback. In Proceedings of the 22nd ICML Workshop on Learning in Web Search, 2005.
18
 
19
F. Radlinski and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
20


Collaborative Colleagues:
Zhicheng Dou: colleagues
Ruihua Song: colleagues
Xiaojie Yuan: colleagues
Ji-Rong Wen: colleagues