| BBM: bayesian browsing model from petabyte-scale data |
| Full text |
Mov
(16:35),
Pdf
(491 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 537-546
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Chao Liu
|
Microsoft Research, Redmond, WA, USA
|
|
Fan Guo
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Christos Faloutsos
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 58, Downloads (12 Months): 134, Citation Count: 0
|
|
|
ABSTRACT
Given a quarter of petabyte click log data, how can we estimate the relevance of each URL for a given query? In this paper, we propose the Bayesian Browsing Model (BBM), a new modeling technique with following advantages: (a) it does exact inference; (b) it is single-pass and parallelizable; (c) it is effective. We present two sets of experiments to test model effectiveness and efficiency. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM out-performs the state-of-the-art competitor by 29.2% in log-likelihood while being 57 times faster. On the second click-log set, spanning a quarter of petabyte data, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Eugene Agichtein , Eric Brill , Susan Dumais , Robert Ragno, Learning user interaction models for predicting web search result preferences, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148175]
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
4
|
]]R. Baeza-Yates. Applications of web query mining. Advances in Information Retrieval, pages 7--22, 2005.
|
| |
5
|
]]R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In International Workshop on Clustering Information over the Web, 2004.
|
 |
6
|
|
 |
7
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
 |
8
|
Nick Craswell , Onno Zoeter , Michael Taylor , Bill Ramsey, An experimental comparison of click position-bias models, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341545]
|
| |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
]]C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streamsat multiple time granularities. Next Generation Data Mining, 2003.
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
20
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Filip Radlinski , Geri Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search, ACM Transactions on Information Systems (TOIS), v.25 n.2, p.7-es, April 2007
[doi> 10.1145/1229179.1229181]
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
]]M. Shokouhi1, F. Scholer, and A. Turpin. Investigating the effectiveness of clickthrough data for document reordering. In ECIR'08, pages 591--595, 2008.
|
 |
25
|
|
| |
26
|
|
 |
27
|
Ming-Feng Tsai , Tie-Yan Liu , Tao Qin , Hsin-Hsi Chen , Wei-Ying Ma, FRank: a ranking method with fidelity loss, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277808]
|
 |
28
|
|
 |
29
|
Gui-Rong Xue , Hua-Jun Zeng , Zheng Chen , Yong Yu , Wei-Ying Ma , WenSi Xi , WeiGuo Fan, Optimizing web search using web click-through data, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
[doi> 10.1145/1031171.1031192]
|
| |
30
|
]]J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory, 51(7):2282--2312, 2005.
|
 |
31
|
|
|