ACM Home Page
Please provide us with feedback. Feedback
Link analysis ranking: algorithms, theory, and experiments
Full text PdfPdf (1.72 MB)
Source ACM Transactions on Internet Technology (TOIT) archive
Volume 5 ,  Issue 1  (February 2005) table of contents
Pages: 231 - 297  
Year of Publication: 2005
ISSN:1533-5399
Authors
Allan Borodin  University of Toronto, Toronto, Canada
Gareth O. Roberts  Lancaster University
Jeffrey S. Rosenthal  University of Toronto, Toronto, Canada
Panayiotis Tsaparas  University of Helsinki, Helsinki, Finland
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 397,   Citation Count: 24
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1052934.1052942
What is a DOI?

ABSTRACT

The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the World Wide Web. The seminal papers of Kleinberg [1998, 1999] and Brin and Page [1998] introduced Link Analysis Ranking, where hyperlink structures are used to determine the relative authority of a Web page and produce improved algorithms for the ranking of Web search results. In this article we work within the hubs and authorities framework defined by Kleinberg and we propose new families of algorithms. Two of the algorithms we propose use a Bayesian approach, as opposed to the usual algebraic and graph theoretic approaches. We also introduce a theoretical framework for the study of Link Analysis Ranking algorithms. The framework allows for the definition of specific properties of Link Analysis Ranking algorithms, as well as for comparing different algorithms. We study the properties of the algorithms that we define, and we provide an axiomatic characterization of the INDEGREE heuristic which ranks each node according to the number of incoming links. We conclude the article with an extensive experimental evaluation. We study the quality of the algorithms, and we examine how different structures in the graphs affect their performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Achlioptas, D., Fiat, A., Karlin, A., and McSherry, F. 2001. Web search through hub synthesis. In Proceedings of the 42nd Foundation of Computer Science (FOCS 2001). Las Vegas, NY.
 
2
Andritsos, P., Tsaparas, P., Miller, R., and Sevcik, K. 2003. LIMBO: Scalable clustering of categorical data. Submitted for publication.
3
 
4
Bernardo, J. and Smith, A. 1994. Bayesian Theory. John Wiley & Sons, Chichester, England.
5
6
 
7
Bianchini, M., Gori, M., and Scarselli, F. 2002. PageRank: A circuital analysis. In Proceedings of the 11th International Word Wide Web Conference (WWW 2002). Poster Session. Hawai.
8
 
9
 
10
Broder, A. 2002. Web searching technology overview. In Advanced School and Workshop on Models and Algorithms for the World Wide Web. Udine, Italy.
 
11
 
12
Chien, S., Dwork, C., Kumar, R., Simon, D., and Sivakumar, D. 2002. Towards exploiting link evolution. In Workshop on Algorithms for the Web. Vancuver, Canada.
 
13
 
14
Davison, B. 2000. Recognizing nepotistic links on the web. In AAAI-2000 Workshop on Artificial Intelligence for Web Search. AAAI Press, Austin, TX.
 
15
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc., Series B, 39, 1--38.
 
16
Diaconis, P. and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Statist. Soc. 39, 2, 262--268.
 
17
18
19
 
20
21
 
22
Gilks, W., Richardson, S., and Spiegelhalter, D. 1996. Markov Chain Monte Carlo in practice. Chapman and Hall, London.
23
 
24
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99. Stockholm, Sweden.
25
26
27
 
28
Katz, L. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 39--43.
 
29
Kendall, M. G. 1970. Rank Correlation Methods. Griffin, London, UK.
 
30
31
 
32
Lee, H. C. and Borodin, A. 2003. Perturbation of the hyperlinked environment. In Proceedings of the 9th International Computing and Combinatorics Conference.
 
33
 
34
Lempel, R. and Moran, S. 2001. Rank stability and rank similarity of Web link-based ranking algorithms. Tech. Rep. CS-2001-22. Technion---Israel Institute of Technology.
 
35
Lempel, R. and Moran, S. 2003. Rank stability and rank similarity of Web link-based ranking algorithms. In 2nd Workshop on Algorithms and Models for the Web-Graph (WAW2003). Budapest, Hungary.
 
36
Lin, J. 1991. Divergence measures based on the Shannon entropy. Mach. Learn. 37, 1, 145--151.
 
37
 
38
Mendelzon, A. and Rafiei, D. 2000. What do the neighbours think? Computing Web page reputations. IEEE Data Eng. Bull. 23, 3, 9--16.
 
39
Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001a. Link analysis, eigenvectors, and stability. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Seattle, Washington.
40
 
41
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep. Stanford Digital Library Technologies Project.
 
42
 
43
Richardson, M. and Domingos, P. 2002. The intelligent surfer: Probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems (NIPS) 14.
 
44
Roberts, G. and Rosenthal, J. 1998. Markov chain Monte Carlo: Some practical implications of theoretical results (with discussion). Canadian J. Statist. 26, 5--31.
 
45
Roberts, G. O. and Rosenthal, J. S. 2003. Downweighting tightly knit communities in World Wide Web rankings. Adv. Appl. Statist. 3, 199--216.
 
46
Silverstein, C., Henzinger, M., Marais, H., and Moricz, M. 1998. Analysis of a very large AltaVista query log. Tech. Rep. 1998-014. Digital SRC.
 
47
Slonim, N. and Tishby, N. 1999. Agglomerative Information Bottleneck. In Advances in Neural Information Processing Systems (NIPS). Breckenridge, CO.
 
48
Smith, A. and Roberts, G. 1993. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion). J. Roy. Statist. Soc., Series B, 55, 3--24.
 
49
Tierney, L. 1994. Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22, 1701--1762.
 
50
Tishby, N., Pereira, F. C., and Bialek, W. 1999. The Information Bottleneck method. In 37th Annual Allerton Conference on Communication, Control and Computing. Urban-Champaign, IL.
51
 
52
53

CITED BY  24

Collaborative Colleagues:
Allan Borodin: colleagues
Gareth O. Roberts: colleagues
Jeffrey S. Rosenthal: colleagues
Panayiotis Tsaparas: colleagues