|
ABSTRACT
User browsing information, particularly their non-search related activity, reveals important contextual information on the preferences and the intent of web users. In this paper, we expand the use of browsing information for web search ranking and other applications, with an emphasis on analyzing individual user sessions for creating aggregate models. In this context, we introduce ClickRank, an efficient, scalable algorithm for estimating web page and web site importance from browsing information. We lay out the theoretical foundation of ClickRank based on an intentional surfer model and analyze its properties. We evaluate its effectiveness for the problem of web search ranking, showing that it contributes significantly to retrieval performance as a novel web search feature. We demonstrate that the results produced by ClickRank for web search ranking are highly competitive with those produced by other approaches, yet achieved at better scalability and substantially lower computational costs. Finally, we discuss novel applications of ClickRank in providing enriched user web search experience, highlighting the usefulness of our approach for non-ranking tasks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Eytan Adar , Daniel S. Weld , Brian N. Bershad , Steven S. Gribble, Why we search: visualizing and predicting user behavior, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242595]
|
 |
2
|
|
 |
3
|
|
| |
4
|
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges in distributed information retrieval. In ICDE, pages 6--20, 2007.
|
 |
5
|
|
 |
6
|
Paolo Boldi , Francesco Bonchi , Carlos Castillo , Debora Donato , Aristides Gionis , Sebastiano Vigna, The query-flow graph: model and applications, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
[doi> 10.1145/1458082.1458163]
|
| |
7
|
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. CRC Press, 1984.
|
 |
8
|
Andrei Z. Broder , Ronny Lempel , Farzin Maghoul , Jan Pedersen, Efficient pagerank approximation via graph aggregation, Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, May 19-21, 2004, New York, NY, USA
[doi> 10.1145/1013367.1013537]
|
| |
9
|
W. Cohen, R. Shapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research, 10:243--270, 1999.
|
 |
10
|
|
| |
11
|
|
 |
12
|
Doug Downey , Susan Dumais , Dan Liebling , Eric Horvitz, Understanding the relationship between searchers' queries and information goals, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
[doi> 10.1145/1458082.1458143]
|
| |
13
|
|
| |
14
|
J. H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189--1232, 2001.
|
| |
15
|
Google. We know the web was big. Online, 2008. http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
 |
21
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
| |
22
|
|
 |
23
|
|
| |
24
|
A. N. Langville and C. D. Meyer. Deeper inside PageRank. Journal of Internet Mathematics, 1(3):335--400, 2005.
|
 |
25
|
|
 |
26
|
Yuting Liu , Bin Gao , Tie-Yan Liu , Ying Zhang , Zhiming Ma , Shuyuan He , Hang Li, BrowseRank: letting web users vote for page importance, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390412]
|
 |
27
|
|
 |
28
|
Mark R. Meiss , Filippo Menczer , Santo Fortunato , Alessandro Flammini , Alessandro Vespignani, Ranking web sites with real user traffic, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
[doi> 10.1145/1341531.1341543]
|
| |
29
|
C. Moler. The world's largest matrix computation. Online, 2002. http://www.mathworks.com/company/newsletters/news_notes/clevescorner/oct02_cleve.html.
|
 |
30
|
|
 |
31
|
|
| |
32
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to The web. Technical Report, Stanford University, 1998.
|
 |
33
|
|
| |
34
|
W. M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846--850, 1971.
|
 |
35
|
|
 |
36
|
|
 |
37
|
|
 |
38
|
|
 |
39
|
|
|