ACM Home Page
Please provide us with feedback. Feedback
BBM: bayesian browsing model from petabyte-scale data
Full text MovMov (16:35),  PdfPdf (491 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 537-546  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Chao Liu  Microsoft Research, Redmond, WA, USA
Fan Guo  Carnegie Mellon University, Pittsburgh, PA, USA
Christos Faloutsos  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 58,   Downloads (12 Months): 134,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557081
What is a DOI?

ABSTRACT

Given a quarter of petabyte click log data, how can we estimate the relevance of each URL for a given query? In this paper, we propose the Bayesian Browsing Model (BBM), a new modeling technique with following advantages: (a) it does exact inference; (b) it is single-pass and parallelizable; (c) it is effective.

We present two sets of experiments to test model effectiveness and efficiency. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM out-performs the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click-log set, spanning a quarter of petabyte data, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
]]R. Baeza-Yates. Applications of web query mining. Advances in Information Retrieval, pages 7--22, 2005.
 
5
]]R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In International Workshop on Clustering Information over the Web, 2004.
6
7
8
 
9
10
11
 
12
13
 
14
]]C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streamsat multiple time granularities. Next Generation Data Mining, 2003.
 
15
16
17
18
19
20
 
21
22
23
 
24
]]M. Shokouhi1, F. Scholer, and A. Turpin. Investigating the effectiveness of clickthrough data for document reordering. In ECIR'08, pages 591--595, 2008.
25
 
26
27
28
29
 
30
]]J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory, 51(7):2282--2312, 2005.
31

Collaborative Colleagues:
Chao Liu: colleagues
Fan Guo: colleagues
Christos Faloutsos: colleagues