|
ABSTRACT
Mining user web search activity potentially has a broad range of applications including web result pre-fetching, automatic search query reformulation, click spam detection, estimation of document relevance and prediction of user satisfaction. This analysis is difficult because the data recorded by search engines while users interact with them, although abundant, is very noisy. In this work, we explore the utility of mining search behavior of users, represented by observed variables including the time the user spends on the page, and whether the user reformulated his or her query. As a case study, we examine the contribution this data makes to predicting the relevance of a document in the absence of document content models. To this end, we first propose a method for grouping the interactions of a particular user according to the different tasks he or she undertakes. With each task corresponding to a distinct information need, we then propose a Bayesian Network to holistically model these interactions. The aim is to identify distinct patterns of search behaviors. Finally, we join these patterns to a list of custom features and we use gradient boosted decision trees to predict the relevance of a set of query document pairs for which we have relevance assessments. The experimental results confirm the potential of our model, with significant improvements in precision for predicting the relevance of documents based on a model of the user's search and click behavior, over a baseline model using only click and query features, with no Bayesian Network input.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Eugene Agichtein , Eric Brill , Susan Dumais , Robert Ragno, Learning user interaction models for predicting web search result preferences, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148175]
|
| |
2
|
|
 |
3
|
|
| |
4
|
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In NIPS 2007, 2007.
|
| |
5
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via de EM algorithm. The Journal of Royal Statistical Society, 39:1--37, 1977.
|
| |
6
|
D. Downey, S. T. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In Proceedings of IJCAI 2007, pages 2740--2747, 2007.
|
| |
7
|
G. Dupret and B. Piwowarski. User behavior and search engine query logs: a generative model to predict clickthrough rate. In Proceedings of SIGIR 2008, 2008.
|
| |
8
|
J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189--1232, 2001.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
G. Miller, E. Galanter, and K. Pribram. Plans and the structure of behavior. Holt, Rhinehart, & Winston, New York, 1960.
|
| |
13
|
S. Mizzaro. How many relevances in information retrieval? Interacting With Computers, 10(3):305--322, 1998.
|
| |
14
|
|
 |
15
|
|
| |
16
|
G. Ridgeway. Generalized boosted models: A guide to the gbm package. http://i-pensieri.com/gregr/papers/gbm-vignette.pdf, 2005.
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
CITED BY 2
|
|
Fan Guo , Chao Liu , Anitha Kannan , Tom Minka , Michael Taylor , Yi-Min Wang , Christos Faloutsos, Click chain model in web search, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|