ACM Home Page
Please provide us with feedback. Feedback
Generating labels from clicks
Full text PdfPdf (571 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Web mining II table of contents
Pages 172-181  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
R. Agrawal  Search Labs, Microsoft Research, Mountain View, CA
A. Halverson  Search Labs, Microsoft Research, Mountain View, CA
K. Kenthapadi  Search Labs, Microsoft Research, Mountain View, CA
N. Mishra  Search Labs, Microsoft Research, Mountain View, CA
P. Tsaparas  Search Labs, Microsoft Research, Mountain View, CA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 291,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498824
What is a DOI?

ABSTRACT

The ranking function used by search engines to order results is learned from labeled training data. Each training point is a (query, URL) pair that is labeled by a human judge who assigns a score of Perfect, Excellent, etc., depending on how well the URL matches the query. In this paper, we study whether clicks can be used to automatically generate good labels. Intuitively, documents that are clicked (resp., skipped) in aggregate can indicate relevance (resp., lack of relevance). We give a novel way of transforming clicks into weighted, directed graphs inspired by eye-tracking studies and then devise an objective function for finding cuts in these graphs that induce a good labeling. In its full generality, the problem is NP-hard, but we show that, in the case of two labels, an optimum labeling can be found in linear time. For the more general case, we propose heuristic solutions. Experiments on real click logs show that click-based labels align with the opinion of a panel of judges, especially as the consensus of the panel grows stronger.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Christopher J. C. Burges, Robert Ragno, and Quoc Viet Le. Learning to rank with nonsmooth cost functions. In NIPS, pages 193--200, 2006.
4
5
6
 
7
Edward Cutrell. Private communication. 2008.
8
9
10
11
12
13
14
 
15
16
17
 
18
Nicole Immorlica, Kamal Jain, Mohammad Mahdian, and Kunal Talwar. Click fraud resistant methods for learning click-through rates. In WINE, pages 34--45, 2005.
 
19
20
21
22
 
23
24
25
 
26
Filip Radlinski and Thorsten Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In AAAI, 2006.
27
28
29
30


Collaborative Colleagues:
R. Agrawal: colleagues
A. Halverson: colleagues
K. Kenthapadi: colleagues
N. Mishra: colleagues
P. Tsaparas: colleagues