|
ABSTRACT
The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page. The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation. However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A. Arasu, J. Cho, H. Garcia-Molina, and S. Raghavan. Searching the web. ACM Transactions on Internet Technologies, 1(1), June 2001.
|
| |
3
|
A. Arasu, J. Novak, Tomkins A, and J. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms. In Poster presentation at the 11th Int. World Wide Web Conference, May 2002.
|
| |
4
|
|
| |
5
|
Krishna Bharat , Andrei Broder , Monika Henzinger , Puneet Kumar , Suresh Venkatasubramanian, The connectivity server: fast access to linkage information on the Web, Proceedings of the seventh international conference on World Wide Web 7, p.469-477, April 1998, Brisbane, Australia
|
 |
6
|
|
| |
7
|
|
| |
8
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
9
|
Soumen Chakrabarti , Byron E. Dom , S. Ravi Kumar , Prabhakar Raghavan , Sridhar Rajagopalan , Andrew Tomkins , David Gibson , Jon Kleinberg, Mining the Web's Link Structure, Computer, v.32 n.8, p.60-67, August 1999
[doi> 10.1109/2.781636]
|
 |
10
|
|
| |
11
|
S. Chien, C. Dwork, R. Kumar, D. Simon, and D. Sivakumar. Link evolution: Analysis and algorithms. In Workshop on Algorithms and Models for the Web Graph, 2002.
|
| |
12
|
|
| |
13
|
T.H. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, October 1999. Available at tt http://dbpubs.stanford.edu:8090/pub/1999-31.
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Computer Science Department, Stanford University, 1999.
|
| |
23
|
|
| |
24
|
M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 2002.
|
| |
25
|
|
CITED BY 8
|
|
|
|
|
|
|
|
|
|
|
Tamás Sarlós , Adrás A. Benczúr , Károly Csalogány , Dániel Fogaras , Balázs Rácz, To randomize or not to randomize: space optimal summaries for hyperlink analysis, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
|
|
|
|
|
|
|
|
|
|
|
|