| Incorporating query difference for learning retrieval functions in world wide web search |
| Full text |
Pdf
(221 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the 15th ACM international conference on Information and knowledge management
table of contents
Arlington, Virginia, USA
SESSION: Personalization and retrieval
table of contents
Pages: 307 - 316
Year of Publication: 2006
ISBN:1-59593-433-2
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 83, Citation Count: 2
|
|
|
ABSTRACT
We discuss information retrieval methods that aim at serving a diverse stream of user queries such as those submitted to commercial search engines. We propose methods that emphasize the importance of taking into consideration of query difference in learning effective retrieval functions. We formulate the problem as a multi-task learning problem using a risk minimization framework. In particular, we show how to calibrate the empirical risk to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we also propose an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. We work out the details for both L1 and L2 regularization cases, and provide convergence analysis for the alternating optimization method for the special case when the retrieval functions belong to a reproducing kernel Hilbert space. We illustrate the effectiveness of the proposed methods using modeling data extracted from a commercial search engine. We also point out how the current framework can be extended in future research.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Bertsekas. Nonlinear programming. Athena Scientific, second edition, 1999.
|
 |
3
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
4
|
|
| |
5
|
R. D. Cook and S. Weisberg. Residuals and influence in regression. Chapman & Hall, 1982.
|
| |
6
|
W. Cooper, F. Gey and A. Chen. Probabilistic retrieval in the TIPSTER collections: an application of staged logistic regression. Proceedings of TREC, 73--88, 1992.
|
| |
7
|
D. Cossock. Method and apparatus for machine learning a document relevance function. US patent application, 20040215606, 2003.
|
| |
8
|
D. Cossock and T. Zhang. Subset ranking using regression. Technical Report, Yahoo! Research Laboratory, 2006.
|
| |
9
|
F. Cucker and S. Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc., 39:1--49, 2002.
|
| |
10
|
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. Proceedings of the international conference on Machine learning, 148--156, 1996.
|
| |
11
|
J. Friedman. Multivariate adaptive regression splines (with discussion). Ann. Statist., 19:1--141, 1991.
|
| |
12
|
J. Friedman. Greedy function approximation: a gradient boosting machine. Ann. Statist., 29:1189--1232, 2001.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
F. Gey, A. Chen, J. He and J. Meggs. Logistic regression at TREC4: probabilistic retrieval from full text document collections. Proceedings of TREC, 65--72, 1995.
|
| |
18
|
M. Hollander and D. A. Wolfe. Nonparametric statistical methods. Wiley-Interscience, 2nd edition, 1999.
|
 |
19
|
|
 |
20
|
|
| |
21
|
T. Joachims. Evaluating retrieval performance using clickthrough data. Proceedings of the SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, 2002.
|
 |
22
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
J. Nocedal and S. Wright. Numerical Optimization. Springer, 1999.
|
 |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
G. Wahba. Spline models for observational data. SIAM press, 1990.
|
 |
31
|
|
| |
32
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Additional Classification:
H.
Information Systems
H.4
INFORMATION SYSTEMS APPLICATIONS
H.4.m
Miscellaneous
General Terms:
Algorithms,
Experimentation,
Theory
Keywords:
WWW search,
alternating optiminization,
discounted cumulative gain,
gradient boosting,
least-squares regression,
machine learning,
quadratic programming,
query dependence,
query document feature,
query specific feature,
regularization,
relevance,
relevance judgment,
retrieval function,
risk minimization
|