|
ABSTRACT
The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the World Wide Web. The seminal papers of Kleinberg [1998, 1999] and Brin and Page [1998] introduced Link Analysis Ranking, where hyperlink structures are used to determine the relative authority of a Web page and produce improved algorithms for the ranking of Web search results. In this article we work within the hubs and authorities framework defined by Kleinberg and we propose new families of algorithms. Two of the algorithms we propose use a Bayesian approach, as opposed to the usual algebraic and graph theoretic approaches. We also introduce a theoretical framework for the study of Link Analysis Ranking algorithms. The framework allows for the definition of specific properties of Link Analysis Ranking algorithms, as well as for comparing different algorithms. We study the properties of the algorithms that we define, and we provide an axiomatic characterization of the INDEGREE heuristic which ranks each node according to the number of incoming links. We conclude the article with an extensive experimental evaluation. We study the quality of the algorithms, and we examine how different structures in the graphs affect their performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Achlioptas, D., Fiat, A., Karlin, A., and McSherry, F. 2001. Web search through hub synthesis. In Proceedings of the 42nd Foundation of Computer Science (FOCS 2001). Las Vegas, NY.
|
| |
2
|
Andritsos, P., Tsaparas, P., Miller, R., and Sevcik, K. 2003. LIMBO: Scalable clustering of categorical data. Submitted for publication.
|
 |
3
|
Yossi Azar , Amos Fiat , Anna Karlin , Frank McSherry , Jared Saia, Spectral analysis of data, Proceedings of the thirty-third annual ACM symposium on Theory of computing, p.619-626, July 2001, Hersonissos, Greece
[doi> 10.1145/380752.380859]
|
| |
4
|
Bernardo, J. and Smith, A. 1994. Bayesian Theory. John Wiley & Sons, Chichester, England.
|
 |
5
|
|
 |
6
|
|
| |
7
|
Bianchini, M., Gori, M., and Scarselli, F. 2002. PageRank: A circuital analysis. In Proceedings of the 11th International Word Wide Web Conference (WWW 2002). Poster Session. Hawai.
|
 |
8
|
Allan Borodin , Gareth O. Roberts , Jeffrey S. Rosenthal , Panayiotis Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, Proceedings of the 10th international conference on World Wide Web, p.415-429, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372096]
|
| |
9
|
|
| |
10
|
Broder, A. 2002. Web searching technology overview. In Advanced School and Workshop on Models and Algorithms for the World Wide Web. Udine, Italy.
|
| |
11
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
12
|
Chien, S., Dwork, C., Kumar, R., Simon, D., and Sivakumar, D. 2002. Towards exploiting link evolution. In Workshop on Algorithms for the Web. Vancuver, Canada.
|
| |
13
|
|
| |
14
|
Davison, B. 2000. Recognizing nepotistic links on the web. In AAAI-2000 Workshop on Artificial Intelligence for Web Search. AAAI Press, Austin, TX.
|
| |
15
|
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc., Series B, 39, 1--38.
|
| |
16
|
Diaconis, P. and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Statist. Soc. 39, 2, 262--268.
|
| |
17
|
P. Drineas , Alan Frieze , Ravi Kannan , Santosh Vempala , V. Vinay, Clustering in large graphs and matrices, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.291-299, January 17-19, 1999, Baltimore, Maryland, United States
|
 |
18
|
Cynthia Dwork , Ravi Kumar , Moni Naor , D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th international conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372165]
|
 |
19
|
Ronald Fagin , Ravi Kumar , Mohammad Mahdian , D. Sivakumar , Erik Vee, Comparing and aggregating rankings with ties, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, France
[doi> 10.1145/1055558.1055568]
|
| |
20
|
|
 |
21
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
| |
22
|
Gilks, W., Richardson, S., and Spiegelhalter, D. 1996. Markov Chain Monte Carlo in practice. Chapman and Hall, London.
|
 |
23
|
|
| |
24
|
Hofmann, T. 1999. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI'99. Stockholm, Sweden.
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
Katz, L. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 39--43.
|
| |
29
|
Kendall, M. G. 1970. Rank Correlation Methods. Griffin, London, UK.
|
| |
30
|
|
 |
31
|
|
| |
32
|
Lee, H. C. and Borodin, A. 2003. Perturbation of the hyperlinked environment. In Proceedings of the 9th International Computing and Combinatorics Conference.
|
| |
33
|
|
| |
34
|
Lempel, R. and Moran, S. 2001. Rank stability and rank similarity of Web link-based ranking algorithms. Tech. Rep. CS-2001-22. Technion---Israel Institute of Technology.
|
| |
35
|
Lempel, R. and Moran, S. 2003. Rank stability and rank similarity of Web link-based ranking algorithms. In 2nd Workshop on Algorithms and Models for the Web-Graph (WAW2003). Budapest, Hungary.
|
| |
36
|
Lin, J. 1991. Divergence measures based on the Shannon entropy. Mach. Learn. 37, 1, 145--151.
|
| |
37
|
|
| |
38
|
Mendelzon, A. and Rafiei, D. 2000. What do the neighbours think? Computing Web page reputations. IEEE Data Eng. Bull. 23, 3, 9--16.
|
| |
39
|
Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001a. Link analysis, eigenvectors, and stability. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Seattle, Washington.
|
 |
40
|
|
| |
41
|
Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep. Stanford Digital Library Technologies Project.
|
| |
42
|
|
| |
43
|
Richardson, M. and Domingos, P. 2002. The intelligent surfer: Probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems (NIPS) 14.
|
| |
44
|
Roberts, G. and Rosenthal, J. 1998. Markov chain Monte Carlo: Some practical implications of theoretical results (with discussion). Canadian J. Statist. 26, 5--31.
|
| |
45
|
Roberts, G. O. and Rosenthal, J. S. 2003. Downweighting tightly knit communities in World Wide Web rankings. Adv. Appl. Statist. 3, 199--216.
|
| |
46
|
Silverstein, C., Henzinger, M., Marais, H., and Moricz, M. 1998. Analysis of a very large AltaVista query log. Tech. Rep. 1998-014. Digital SRC.
|
| |
47
|
Slonim, N. and Tishby, N. 1999. Agglomerative Information Bottleneck. In Advances in Neural Information Processing Systems (NIPS). Breckenridge, CO.
|
| |
48
|
Smith, A. and Roberts, G. 1993. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion). J. Roy. Statist. Soc., Series B, 55, 3--24.
|
| |
49
|
Tierney, L. 1994. Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22, 1701--1762.
|
| |
50
|
Tishby, N., Pereira, F. C., and Bialek, W. 1999. The Information Bottleneck method. In 37th Annual Allerton Conference on Communication, Control and Computing. Urban-Champaign, IL.
|
 |
51
|
|
| |
52
|
|
 |
53
|
|
CITED BY 24
|
|
|
|
|
Reid Andersen , Christian Borgs , Jennifer Chayes , Uriel Feige , Abraham Flaxman , Adam Kalai , Vahab Mirrokni , Moshe Tennenholtz, Trust-based recommendation systems: an axiomatic approach, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
Aristides Gionis , Heikki Mannila , Kai Puolamäki , Antti Ukkonen, Algorithms for discovering bucket orders from data, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|