ACM Home Page
Please provide us with feedback. Feedback
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
Full text PdfPdf (667 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR: enterprise search table of contents
Pages 1123-1132  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Barbara Poblete  University Pompeu Fabra, Barcelona, Spain
Carlos Castillo  Yahoo! Research, Barcelona, Spain
Aristides Gionis  Yahoo! Research, Barcelona, Spain
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 142,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458231
What is a DOI?

ABSTRACT

We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink and click graphs. The hyperlink graph expresses link structure among Web pages, while the click graph is a bipartite graph of queries and documents denoting users' searching behavior extracted from a search engine's query log.

Our most important motivation is to model in a unified way the two main activities of users on the Web: searching and browsing, and at the same time to analyze the effects of random walks on this new graph. The intuition behind this task is to measure how the combination of link structure and usage data provide additional information to that contained in these structures independently.

Our experimental results show that both hyperlink and click graphs have strengths and weaknesses when it comes to using their stationary distribution scores for ranking Web pages. Furthermore, our evaluation indicates that the unified graph always generates consistent and robust scores that follow closely the best result obtained from either individual graph, even when applied to "noisy" data. It is our belief that the unified Web graph has several useful properties for improving current Web document ranking, as well as for generating new rankings of its own. In particular stationary distribution scores derived from the random walks on the combined graph can be used as an indicator of whether structural or usage data are more reliable in different situations.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of Web Spam. In Second International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), Seattle, USA, August 2006.
 
3
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. pages 407--416, 2000.
 
4
5
6
7
 
8
G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW2007 workshop Query Log Analysis: Social and Technological Challenges, 2007.
 
9
D. Fetterly. Adversarial information retrieval: The manipulation of web content. ACM Computing Reviews, July 2007.
 
10
11
12
 
13
W. Kruskal and L. Goodman. Measures of association for cross classifications. Journal of the American Statistical Association, 49, 1954.
 
14
M. Lifantsev. Voting model for ranking Web pages. In P. Graham and M. Maheswaran, editors, Proceedings of the International Conference on Internet Computing, pages 143--148, Las Vegas, Nevada, USA, June 2000. CSREA Press.
 
15
 
16
F. Radlinski. Addressing malicious noise in clickthrough data. In Learning to Rank for Information Retrieval Workshop at SIGIR 2007, 2007.
17
18
19


Collaborative Colleagues:
Barbara Poblete: colleagues
Carlos Castillo: colleagues
Aristides Gionis: colleagues