| Are click-through data adequate for learning web search rankings? |
| Full text |
Pdf
(229 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: IR: web search 1
table of contents
Pages 73-82
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 37, Downloads (12 Months): 185, Citation Count: 1
|
|
|
ABSTRACT
Learning-to-rank algorithms, which can automatically adapt ranking functions in web search, require a large volume of training data. A traditional way of generating training examples is to employ human experts to judge the relevance of documents. Unfortunately, it is difficult, time-consuming and costly. In this paper, we study the problem of exploiting click-through data for learning web search rankings that can be collected at much lower cost. We extract pairwise relevance preferences from a large-scale aggregated click-through dataset, compare these preferences with explicit human judgments, and use them as training examples to learn ranking functions. We find click-through data are useful and effective in learning ranking functions. A straightforward use of aggregated click-through data can outperform human judgments. We demonstrate that the strategies are only slightly affected by fraudulent clicks. We also reveal that the pairs which are very reliable, e.g., the pairs consisting of documents with large click frequency differences, are not sufficient for learning.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Eugene Agichtein , Eric Brill , Susan Dumais , Robert Ragno, Learning user interaction models for predicting web search result preferences, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148175]
|
 |
3
|
|
 |
4
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
5
|
C.J.C. Burges, R. Ragno, and Q.V. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 18, pages 395--402, Cambridge, MA, 2006. MIT Press.
|
 |
6
|
Yunbo Cao , Jun Xu , Tie-Yan Liu , Hang Li , Yalou Huang , Hsiao-Wuen Hon, Adapting ranking SVM to document retrieval, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148205]
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
|
 |
12
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
13
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Filip Radlinski , Geri Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search, ACM Transactions on Information Systems (TOIS), v.25 n.2, p.7-es, April 2007
[doi> 10.1145/1229179.1229181]
|
 |
14
|
|
| |
15
|
M. Kendall and B.B. Smith. Randomness and random sampling numbers. Journal of the Royal Statistical Society, 101(1):147--166, 1938.
|
| |
16
|
T.-Y. Liu, T. Qin, J. Xu, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In LR4IR 2007 in conjunction with SIGIR 2007, 2007.
|
| |
17
|
{17} F. Radlinski and T. Joachims. Evaluating the robustness of learning from implicit feedback. In Proceedings of the 22nd ICML Workshop on Learning in Web Search, 2005.
|
 |
18
|
|
| |
19
|
F. Radlinski and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
|
 |
20
|
|
CITED BY
|
R. Agrawal , A. Halverson , K. Kenthapadi , N. Mishra , P. Tsaparas, Generating labels from clicks, Proceedings of the Second ACM International Conference on Web Search and Data Mining, February 09-12, 2009, Barcelona, Spain
|
|