|
ABSTRACT
The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze features of the rapidly growing "frontier" of the web, namely the part of the web that crawlers are unable to cover for one reason or another. We analyze the effect of these pages and find it to be significant. We suggest ways to improve the quality of ranking by modeling the growing presence of "link rot" on the web as more sites and pages fall out of maintenance. Finally we suggest new methods of ranking that are motivated by the hierarchical structure of the web, are more efficient than PageRank, and may be more resistant to direct manipulation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Gianni Amati, Iadh Ounis, and Vassilis Plachouras. The dynamic absorbing model for the web. Technical Report TR-2003-137, University of Glasgow, April 2003.
|
| |
3
|
Arvind Arasu, Jasmine Novak, Andrew S. Tomkins, and John A. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms. In Poster Proc. WWW2002, Honolulu, 2002.
|
| |
4
|
T. Berners-Lee, R. Fielding, and L. Masinter. Uniform resource identifiers (URI): Generic syntax. http://www.ietf.org/rfc/rfc2396.txthttp://www.ietf.org/rfc/rfc2396.t%xt. RFC 2396.
|
 |
5
|
Allan Borodin , Gareth O. Roberts , Jeffrey S. Rosenthal , Panayiotis Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, Proceedings of the 10th international conference on World Wide Web, p.415-429, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372096]
|
| |
6
|
Sergey Brin, Rajeev Motwani, Lawrence Page, and Terry Winograd. What can you do with a web in your pocket? Data Engineering Bulletin, 21:37--47, 1998.
|
| |
7
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
 |
8
|
|
| |
9
|
Steve Chien, Cynthia Dwork, Ravi Kumar, and D. Sivakumar. Towards exploiting link evolution. In Workshop on Algorithms and Models for the Web Graph, 2001.
|
| |
10
|
|
 |
11
|
|
| |
12
|
Brian D. Davison. Recognizing nepotistic links on the web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000.
|
 |
13
|
Chris Ding , Xiaofeng He , Parry Husbands , Hongyuan Zha , Horst D. Simon, PageRank, HITS and a unified framework for link analysis, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564440]
|
 |
14
|
|
| |
15
|
Nadav Eiron and Kevin S. McCurley. Locality, hierarchy, and bidirectionality in the web. In Workshop on Algorithms and Models for the Web Graph, Budapest, May 2003.
|
 |
16
|
|
 |
17
|
Ronald Fagin , Ravi Kumar , Kevin S. McCurley , Jasmine Novak , D. Sivakumar , John A. Tomlin , David P. Williamson, Searching the workplace web, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
[doi> 10.1145/775152.775204]
|
| |
18
|
Gene H. Golub and Charles van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 3rd edition, 1996.
|
 |
19
|
|
| |
20
|
Taher Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.
|
 |
21
|
|
| |
22
|
Sepandar Kamvar, Taher Haveliwala, and Gene Golub. Adaptive methods for the computation of pagerank. Technical report, April 2003.
|
| |
23
|
Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.
|
 |
24
|
|
 |
25
|
|
| |
26
|
John Markwell and David W. Brooks. Link rot limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochem. Mol. Biol. Educ., 31:69--72, 2003.
|
 |
27
|
|
| |
28
|
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. Paper SIDL-WP-1999-0120 (version of 11/11/1999).
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
CITED BY 32
|
|
|
|
|
|
|
|
Gui-Rong Xue , Qiang Yang , Hua-Jun Zeng , Yong Yu , Zheng Chen, Exploiting the hierarchical structure for link analysis, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlos Castillo , Debora Donato , Luca Becchetti , Paolo Boldi , Stefano Leonardi , Massimo Santini , Sebastiano Vigna, A reference collection for web spam, ACM SIGIR Forum, v.40 n.2, p.11-24, December 2006
|
|
|
|
|
|
Prasanna Kumar Desikan , Nishith Pathak , Jaideep Srivastava , Vipin Kumar, Divide and conquer approach for efficient pagerank computation, Proceedings of the 6th international conference on Web engineering, July 11-14, 2006, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
Guang Feng , Tie-Yan Liu , Ying Wang , Ying Bao , Zhiming Ma , Xu-Dong Zhang , Wei-Ying Ma, AggregateRank: bringing order to web sites, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
Klaus Berberich , Srikanta Bedathur , Gerhard Weikum , Michalis Vazirgiannis, Comparing apples and oranges: normalized pagerank for evolving graphs, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Anirban Dasgupta , Arpita Ghosh , Ravi Kumar , Christopher Olston , Sandeep Pandey , Andrew Tomkins, The discoverability of the web, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
András Benczúr , István Bíró , Károly Csalogány , Tamás Sarlós, Web spam detection via commercial intent analysis, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
Carlos Castillo , Debora Donato , Aristides Gionis , Vanessa Murdock , Fabrizio Silvestri, Know your neighbors: web spam detection using the web topology, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hao Ma , Haixuan Yang , Irwin King , Michael R. Lyu, Learning latent semantic relations from clickthrough data for query suggestion, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|