ACM Home Page
Please provide us with feedback. Feedback
Incorporating query difference for learning retrieval functions in world wide web search
Full text PdfPdf (221 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 15th ACM international conference on Information and knowledge management table of contents
Arlington, Virginia, USA
SESSION: Personalization and retrieval table of contents
Pages: 307 - 316  
Year of Publication: 2006
ISBN:1-59593-433-2
Authors
Hongyuan Zha  Georgia Institute of Technology, Atlanta, GA
Zhaohui Zheng  Yahoo! Inc., Sunnyvale, CA
Haoying Fu  Yahoo! Inc., Sunnyvale, CA
Gordon Sun  Yahoo! Inc., Sunnyvale, CA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 83,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1183614.1183660
What is a DOI?

ABSTRACT

We discuss information retrieval methods that aim at serving a diverse stream of user queries such as those submitted to commercial search engines. We propose methods that emphasize the importance of taking into consideration of query difference in learning effective retrieval functions. We formulate the problem as a multi-task learning problem using a risk minimization framework. In particular, we show how to calibrate the empirical risk to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we also propose an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. We work out the details for both L1 and L2 regularization cases, and provide convergence analysis for the alternating optimization method for the special case when the retrieval functions belong to a reproducing kernel Hilbert space. We illustrate the effectiveness of the proposed methods using modeling data extracted from a commercial search engine. We also point out how the current framework can be extended in future research.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
D. Bertsekas. Nonlinear programming. Athena Scientific, second edition, 1999.
3
 
4
 
5
R. D. Cook and S. Weisberg. Residuals and influence in regression. Chapman & Hall, 1982.
 
6
W. Cooper, F. Gey and A. Chen. Probabilistic retrieval in the TIPSTER collections: an application of staged logistic regression. Proceedings of TREC, 73--88, 1992.
 
7
D. Cossock. Method and apparatus for machine learning a document relevance function. US patent application, 20040215606, 2003.
 
8
D. Cossock and T. Zhang. Subset ranking using regression. Technical Report, Yahoo! Research Laboratory, 2006.
 
9
F. Cucker and S. Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc., 39:1--49, 2002.
 
10
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. Proceedings of the international conference on Machine learning, 148--156, 1996.
 
11
J. Friedman. Multivariate adaptive regression splines (with discussion). Ann. Statist., 19:1--141, 1991.
 
12
J. Friedman. Greedy function approximation: a gradient boosting machine. Ann. Statist., 29:1189--1232, 2001.
13
14
15
16
 
17
F. Gey, A. Chen, J. He and J. Meggs. Logistic regression at TREC4: probabilistic retrieval from full text document collections. Proceedings of TREC, 65--72, 1995.
 
18
M. Hollander and D. A. Wolfe. Nonparametric statistical methods. Wiley-Interscience, 2nd edition, 1999.
19
20
 
21
T. Joachims. Evaluating retrieval performance using clickthrough data. Proceedings of the SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, 2002.
22
23
24
25
 
26
J. Nocedal and S. Wright. Numerical Optimization. Springer, 1999.
27
 
28
29
 
30
G. Wahba. Spline models for observational data. SIAM press, 1990.
31
 
32


Collaborative Colleagues:
Hongyuan Zha: colleagues
Zhaohui Zheng: colleagues
Haoying Fu: colleagues
Gordon Sun: colleagues