|
ABSTRACT
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this article, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In short, the stability of the widely accepted statistical description of the Web is called into question. In order to provide a more accurate characterization of the Web graph, we study statistical measures beyond the degree distribution, such as degree-degree correlation functions or the statistics of reciprocal connections. The latter appears to enclose the relevant correlations of the WWW graph and carry most of the topological information of the Web. The analysis of this quantity is also of major interest in relation to the navigability and searchability of the Web.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Albert, R., Jeong, H., and Barabási, A.-L. 1999. Diameter of the World-Wide Web. Nature 401, 6749, 130--131.
|
 |
3
|
|
| |
4
|
|
| |
5
|
Barabási, A.-L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.
|
| |
6
|
Barabási, A.-L., Albert, R., and Jeong, H. 2000. Scale-free characteristics of random networks: The topology of the World-Wide Web. Physica A 281, 1-4, 69--77.
|
| |
7
|
Barrat, A., Barthélemy, M., and Vespignani, A. 2004. Traffic-driven model of the World Wide Web graph, Stephano Leonardi, Ed. Algorithms and Models for the Web-Graph. Lecture Notes in Computer Science, vol. 3243. Springer, Berlin, Heidelburg, Germany, 56--67.
|
| |
8
|
Boguñá, M. and Serrano, M. A. 2005. Generalized percolation in random directed networks. Phys. Rev. E 72, 1, 016106.
|
| |
9
|
|
 |
10
|
|
| |
11
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.33 n.1-6, p.309-320, June 2000
|
| |
12
|
|
| |
13
|
Cohen, R., Erez, K., ben Avraham, D., and Havlin, S. 2000. Resilience of the Internet to random breakdown. Phys. Rev. Lett. 85, 21, 4626.
|
| |
14
|
|
| |
15
|
Stephen Dill , Ravi Kumar , Kevin S. McCurley , Sridhar Rajagopalan , D. Sivakumar , Andrew Tomkins, Self-similarity in the Web, Proceedings of the 27th International Conference on Very Large Data Bases, p.69-78, September 11-14, 2001
|
| |
16
|
Donato, D., Laura, L., Leonardi, S., and Millozzi, S. 2004. Large scale properties of the Webgraph. Eur. Phys. J. B 38, 2, 239--243.
|
| |
17
|
Donato, D., Leonardi, S., Millozzi, S., and Tsaparas, P. 2005. Mining the inner structure of the Web graph. In Proceedings of the Eighth International Workshop on the Web and Databases (WebDB). 145--150.
|
| |
18
|
|
| |
19
|
Eckmann, J. P. and Moses, E. 2002. Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Procc. Natl. Acad. Sci. 99, 9, 5825--5829.
|
| |
20
|
Fortunato, S., Boguñá, M., Flammini, A., and Menczer, F. 2006. Approximating pagerank from in-degree. In cs.IR/0511016, presented at the Fourth Workshop on Algorithms and Models for the Web-Graph, Nov. 30 -- Dec. 1, Banff, Alta., (Canada).
|
| |
21
|
Garlaschelli, D. and Loffredo, M. I. 2004. Patterns of link reciprocity in directed networks. Phys. Rev. Lett. 93, 26, 268701.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
R. Kumar , P. Raghavan , S. Rajagopalan , D. Sivakumar , A. Tomkins , E. Upfal, Stochastic models for the Web graph, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p.57, November 12-14, 2000
|
| |
26
|
|
| |
27
|
Lawrence, S. and Giles, C. L. 1998. Searching the world wide web. Science 280, 5360, 98--100.
|
| |
28
|
Lawrence, S. and Giles, C. L. 1999. Accessibility of information on the Web. Nature 400, 6740, 107--109.
|
 |
29
|
Priya Mahadevan , Dmitri Krioukov , Kevin Fall , Amin Vahdat, Systematic topology analysis and generation using degree correlations, Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, September 11-15, 2006, Pisa, Italy
|
| |
30
|
Mossa, S., Barthélemy, M., Stanley, H. E., and Amaral, L. A. N. 2002. Truncation of power law behavior in scale-free network models due to information filtering. Phys. Rev. Lett. 88, 13, 138701.
|
| |
31
|
Newman, M. E. J. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 20, 208701.
|
| |
32
|
Pastor-Satorras, R., Vázquez, A., and Vespignani, A. 2001. Dynamical and correlation properties of the Internet. Phys. Rev. Lett. 87, 25, 258701.
|
| |
33
|
Pastor-Satorras, R. and Vespignani, A. 2001. Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 14, 3200--3203.
|
| |
34
|
|
| |
35
|
Pennock, D. M., Flake, G. W., Lawrence, S., Glover, E. J., and Giles, C. L. 2002. Winners don't take all: Characterizing the competition for links on the web. Proc. Natl. Acad. Sci. 99, 8, 5207--5211.
|
| |
36
|
Rusmevichientong, P., Pennock, D. M., Lawrence, S., and Giles, C. L. 2001. Methods for sampling pages uniformly from the World Wide Web. In Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation. 121--128.
|
CITED BY 2
|
|
Mark R. Meiss , Filippo Menczer , Santo Fortunato , Alessandro Flammini , Alessandro Vespignani, Ranking web sites with real user traffic, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|