ACM Home Page
Please provide us with feedback. Feedback
Smoothing clickthrough data for web search ranking
Full text PdfPdf (256 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Clickthrough models table of contents
Pages 355-362  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Jianfeng Gao  Microsoft Research, Redmond, USA
Wei Yuan  University of Montreal, Montreal, Canada
Xiao Li  Microsoft Research, Redmond, USA
Kefeng Deng  Microsoft China, Beijing, China
Jian-Yun Nie  University of Montreal, Montreal, Canada
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 65,   Downloads (12 Months): 246,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1572003
What is a DOI?

ABSTRACT

Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and documents have no or very few clicks. The ranker thus cannot rely strongly on clickthrough features for document ranking. This paper presents two smoothing methods to expand clickthrough data: query clustering via Random Walk on click graphs and a discounting method inspired by the Good-Turing estimator. Both methods are evaluated on real-world data in three Web search domains. Experimental results show that the ranking models trained on smoothed clickthrough features consistently outperform those trained on unsmoothed features. This study demonstrates both the importance and the benefits of dealing with the sparseness problem in clickthrough data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
 
5
 
6
Burges, C.J., Ragno, R.,&Le, Q.V. 2006. Learning to rank with nonsmooth cost functions. In NIPS, pp. 395--402.
7
 
8
Chen, S. and Goodman, J. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University.
9
10
11
 
12
Ghahramani, Z. and Jordan, M.I. 1994. Supervised learning from incomplete data via an EM approach. In NIPS, pp.
 
13
Good, I.J. 1953. The population frequencies of species and the estimation of population parameters. Biomerika, 40 (3-4): 237--264.
 
14
Goodman, J. 2001. A bit of progress in language modeling (extended version). Technical Report MSR-TR-2001-72, Microsoft Research.
 
15
Goodman, J. and Gao, J. 2000. Language model size reduction by pruning and clustering. In ICSLP, pp. 176--182.
 
16
Hastie, T., Tibshirani, R. and Friedman, J. 2001. The elements of statistical learning. Springer-Verlag, New York.
17
18
19
 
20
Katz, S.M. 1987. Estimation of probabilities from sparse data for the language model of a speech recognizer. IEEE Trans on Acoustics, Speech and Signal Processing, ASSP-35(3): 400--401.
21
 
22
23
 
24
Lowe, D. and Webb, A.R. 1990. Exploit prior knowledge in network optimization: an illustration from medical prognosis. Network: Computation in Neural Systems, 1(3):299--323.
25
26
27
28

Collaborative Colleagues:
Jianfeng Gao: colleagues
Wei Yuan: colleagues
Xiao Li: colleagues
Kefeng Deng: colleagues
Jian-Yun Nie: colleagues