ACM Home Page
Please provide us with feedback. Feedback
The Web as a graph: How far we are
Full text PdfPdf (579 KB)
Source ACM Transactions on Internet Technology (TOIT) archive
Volume 7 ,  Issue 1  (February 2007) table of contents
Article No. 4  
Year of Publication: 2007
ISSN:1533-5399
Authors
Debora Donato  University of Rome, Roma, Italy
Luigi Laura  University of Rome, Roma, Italy
Stefano Leonardi  University of Rome, Roma, Italy
Stefano Millozzi  University of Rome, Roma, Italy
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 182,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1189740.1189744
What is a DOI?

ABSTRACT

In this article we present an experimental study of the properties of webgraphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford, as well as several synthetic ones generated according to various models proposed recently. We investigate several topological properties of such graphs, including the number of bipartite cores and strongly connected components, the distribution of degrees and PageRank values and some correlations; we present a comparison study of the models against these measures.Our findings are that (i) the WebBase sample differs slightly from the (older) samples studied in the literature, and (ii) despite the fact that these models do not catch all of its properties, they do exhibit some peculiar behaviors not found, for example, in the models from classical random graph theory.Moreover we developed a software library able to generate and measure massive graphs in secondary memory; this library is publicy available under the GPL licence. We discuss its implementation and some computational issues related to secondary memory graph algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Bollobas, B. and Riordan, O. 2003. Robustness and vulnerability of scale-free random graphs. Internet Math. 1, 1, 1--35.
 
4
Barabasi, A. and Albert, A. 1999. Emergence of scaling in random networks. Science 286, 509.
 
5
Boldi, P., Codenotti, B., Santini, M., and Vigna, S. 2002. Ubicrawler: A scalable fully distributed web crawler.
6
 
7
 
8
9
10
 
11
 
12
cyvellance. www.cyvellance.com. Cyvellance.
 
13
Diestel, R. 1997. Graph Theory. Springer, New York.
 
14
 
15
Donato, D., Laura, L., Leonardi, S., and Millozzi, S. 2004a. Large scale properties of the Webgraph. Europ. J. Phys. B 38, 2, 239--243. DOI: 10.1140/epjb/e2004-00056-6.
 
16
 
17
Donato, D., Laura, L., Leonardi, S., and Millozzi, S. 2004c. A software library for generating and measuring massive Webgraphs. Tech. Rep. D13, COSIN European Research Project. http://www.dis.uniroma1.it/~cosin/html_pages/COSIN-Tools.htm.
 
18
Erdös, P. and Rényi, A. 1960. On the evoluation of random graphs Publ. Math. Inst. Hung. Acad. Sci 5.
 
19
Gleich, D., Zuchov, L., and Berkhin, P. 2004. Fast Parallel PageRank: A Linear System Approach. Tech. Rep. 038, Yahoo! Research.
20
 
21
Harary, F. 1969. Graph Theory. Addison-Wesley, Reading, MA.
 
22
Haveliwala, T. H. 1999. Efficient computation of PageRank. Tech. rep., Stanford University.
23
 
24
Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. The Web as a graph: measurements, models and methods. In Proceedings of the International Conference on Combinatorics and Computing. 1--18.
 
25
 
26
Kraft, R., Hastor, E., and Stata, R. 2003. Timelinks: Exploring the link structure of the evolving Web. In Second Workshop on Algorithms and Models for the Web-Graph (WAW2003).
 
27
 
28
 
29
Laura, L., Leonardi, S., Caldarelli, G., and De Los Rios, P. 2002. A multi-layer model for the Webgraph. In On-line proceedings of the 2nd International Workshop on Web Dynamics.
 
30
Mitzenmacher, M. 2003. A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 2.
 
31
 
32
Pennock, D., Flake, G., Lawrence, S., Glover, E., and Giles, C. 2002. Winners don't take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences 99, 8 (April), 5207--5211.
33
 
34
Tarjan, R. E. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2, 146--160.
 
35
Vitter, J. and Shriver, E. 1994a. Algorithms for parallel memory i: Two-level memories. Algorithmica 12, 2-3, 107--114.
 
36
Vitter, J. and Shriver, E. 1994b. Algorithms for parallel memory ii: Hierarchical multilevel memories. Algorithmica 12, 2-3, 148--169.
37
 
38
webbase. The Stanford Webbase project. http://www-diglib.stanford.edu/~testbed/doc2/WebBase/.
 
39


Collaborative Colleagues:
Debora Donato: colleagues
Luigi Laura: colleagues
Stefano Leonardi: colleagues
Stefano Millozzi: colleagues