| Generating labels from clicks |
| Full text |
Pdf
(571 KB)
|
| Source
|
Web Search and Web Data Mining
archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining
table of contents
Barcelona, Spain
SESSION: Web mining II
table of contents
Pages 172-181
Year of Publication: 2009
ISBN:978-1-60558-390-7
|
|
Authors
|
|
R. Agrawal
|
Search Labs, Microsoft Research, Mountain View, CA
|
|
A. Halverson
|
Search Labs, Microsoft Research, Mountain View, CA
|
|
K. Kenthapadi
|
Search Labs, Microsoft Research, Mountain View, CA
|
|
N. Mishra
|
Search Labs, Microsoft Research, Mountain View, CA
|
|
P. Tsaparas
|
Search Labs, Microsoft Research, Mountain View, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 29, Downloads (12 Months): 302, Citation Count: 1
|
|
|
ABSTRACT
The ranking function used by search engines to order results is learned from labeled training data. Each training point is a (query, URL) pair that is labeled by a human judge who assigns a score of Perfect, Excellent, etc., depending on how well the URL matches the query. In this paper, we study whether clicks can be used to automatically generate good labels. Intuitively, documents that are clicked (resp., skipped) in aggregate can indicate relevance (resp., lack of relevance). We give a novel way of transforming clicks into weighted, directed graphs inspired by eye-tracking studies and then devise an objective function for finding cuts in these graphs that induce a good labeling. In its full generality, the problem is NP-hard, but we show that, in the case of two labels, an optimum labeling can be found in linear time. For the more general case, we propose heuristic solutions. Experiments on real click logs show that click-based labels align with the opinion of a panel of judges, especially as the consensus of the panel grows stronger.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Christopher J. C. Burges, Robert Ragno, and Quoc Viet Le. Learning to rank with nonsmooth cost functions. In NIPS, pages 193--200, 2006.
|
 |
4
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
 |
5
|
|
 |
6
|
|
| |
7
|
Edward Cutrell. Private communication. 2008.
|
 |
8
|
|
 |
9
|
Zhicheng Dou , Ruihua Song , Xiaojie Yuan , Ji-Rong Wen, Are click-through data adequate for learning web search rankings?, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
[doi> 10.1145/1458082.1458095]
|
 |
10
|
|
 |
11
|
Cynthia Dwork , Ravi Kumar , Moni Naor , D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th international conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372165]
|
 |
12
|
Ronald Fagin , Ravi Kumar , Mohammad Mahdian , D. Sivakumar , Erik Vee, Comparing and aggregating rankings with ties, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, France
[doi> 10.1145/1055558.1055568]
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
Aristides Gionis , Heikki Mannila , Kai Puolamäki , Antti Ukkonen, Algorithms for discovering bucket orders from data, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150468]
|
 |
17
|
|
| |
18
|
Nicole Immorlica, Kamal Jain, Mohammad Mahdian, and Kunal Talwar. Click fraud resistant methods for learning click-through rates. In WINE, pages 34--45, 2005.
|
| |
19
|
|
 |
20
|
|
 |
21
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
22
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Filip Radlinski , Geri Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search, ACM Transactions on Information Systems (TOIS), v.25 n.2, p.7-es, April 2007
[doi> 10.1145/1229179.1229181]
|
| |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
Filip Radlinski and Thorsten Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In AAAI, 2006.
|
 |
27
|
|
 |
28
|
Michael Taylor , John Guiver , Stephen Robertson , Tom Minka, SoftRank: optimizing non-smooth rank metrics, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341544]
|
 |
29
|
|
 |
30
|
|
CITED BY
|
|
Ariel Fuxman , Anitha Kannan , Andrew B. Goldberg , Rakesh Agrawal , Panayiotis Tsaparas , John Shafer, Improving classification accuracy using automatically extracted training data, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|