ACM Home Page
Please provide us with feedback. Feedback
Mining user web search activity with layered bayesian networks or how to capture a click in its context
Full text PdfPdf (369 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Web mining II table of contents
Pages 162-171  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Authors
Benjamin Piwowarski  University of Glasgow, Scotland, UK
Georges Dupret  Yahoo! Labs, Sunnyvale, CA
Rosie Jones  Yahoo! Labs, Sunnyvale, CA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 33,   Downloads (12 Months): 292,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498823
What is a DOI?

ABSTRACT

Mining user web search activity potentially has a broad range of applications including web result pre-fetching, automatic search query reformulation, click spam detection, estimation of document relevance and prediction of user satisfaction. This analysis is difficult because the data recorded by search engines while users interact with them, although abundant, is very noisy. In this work, we explore the utility of mining search behavior of users, represented by observed variables including the time the user spends on the page, and whether the user reformulated his or her query. As a case study, we examine the contribution this data makes to predicting the relevance of a document in the absence of document content models. To this end, we first propose a method for grouping the interactions of a particular user according to the different tasks he or she undertakes. With each task corresponding to a distinct information need, we then propose a Bayesian Network to holistically model these interactions. The aim is to identify distinct patterns of search behaviors. Finally, we join these patterns to a list of custom features and we use gradient boosted decision trees to predict the relevance of a set of query document pairs for which we have relevance assessments. The experimental results confirm the potential of our model, with significant improvements in precision for predicting the relevance of documents based on a model of the user's search and click behavior, over a baseline model using only click and query features, with no Bayesian Network input.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
 
4
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In NIPS 2007, 2007.
 
5
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via de EM algorithm. The Journal of Royal Statistical Society, 39:1--37, 1977.
 
6
D. Downey, S. T. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In Proceedings of IJCAI 2007, pages 2740--2747, 2007.
 
7
G. Dupret and B. Piwowarski. User behavior and search engine query logs: a generative model to predict clickthrough rate. In Proceedings of SIGIR 2008, 2008.
 
8
J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189--1232, 2001.
 
9
10
 
11
 
12
G. Miller, E. Galanter, and K. Pribram. Plans and the structure of behavior. Holt, Rhinehart, & Winston, New York, 1960.
 
13
S. Mizzaro. How many relevances in information retrieval? Interacting With Computers, 10(3):305--322, 1998.
 
14
15
 
16
G. Ridgeway. Generalized boosted models: A guide to the gbm package. http://i-pensieri.com/gregr/papers/gbm-vignette.pdf, 2005.
17
18
19
20


Collaborative Colleagues:
Benjamin Piwowarski: colleagues
Georges Dupret: colleagues
Rosie Jones: colleagues