|
ABSTRACT
We offer the first large-scale analysis of Web traffic based on network flow data. Using data collected on the Internet2 network, we constructed a weighted bipartite client-server host graph containing more than 18 x 106 vertices and 68 x 106 edges valued by relative traffic flows. When considered as a traffic map of the World-Wide Web, the generated graph provides valuable information on the statistical patterns that characterize the global information flow on the Web. Statistical analysis shows that client-server connections and traffic flows exhibit heavy-tailed probability distributions lacking any typical scale. In particular, the absence of an intrinsic average in some of the distributions implies the absence of a prototypical scale appropriate for server design, Web-centric network design, or traffic modeling. The inspection of the amount of traffic handled by clients and servers and their number of connections highlights non-trivial correlations between information flow and patterns of connectivity as well as the presence of anomalous statistical patterns related to the behavior of users on the Web. The results presented here may impact considerably the modeling, scalability analysis, and behavioral study of Web applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
R. Albert and A.-L. Barabási. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47--97, 2002.
|
| |
3
|
R. Albert, H. Jeong, and A.-L. Barabási. Diameter of the World-Wide Web. Nature, 401:130--131, 1999.
|
| |
4
|
L. Amaral, A. Scala, M. Barthélemy, and H. Stanley. Classes of small-world networks. Proc. Natl. Acad. Sci. USA, 97:11149, 2000.
|
| |
5
|
A. Barrat, M. Barthélemy, R. Pastor-Satorras, and A. Vespignani. The architecture of complex weighted networks. Proc. Natl. Acad. Sci. USA, 101:3747, 2004.
|
| |
6
|
|
| |
7
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.33 n.1-6, p.309-320, June 2000
|
| |
8
|
G. Chartrand and L. Lesniak. Graphs and Digraphs. Chapman & Hall/CRC, 1996.
|
| |
9
|
S. Dorogovtsev and J. Mendes. Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK, 2003.
|
 |
10
|
Anja Feldmann , Nils Kammenhuber , Olaf Maennel , Bruce Maggs , Roberto De Prisco , Ravi Sundaram, A methodology for estimating interdomain web traffic demand, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, October 25-27, 2004, Taormina, Sicily, Italy
[doi> 10.1145/1028788.1028833]
|
| |
11
|
B. Huberman. The Laws of the Web. MIT Press, 2001.
|
| |
12
|
B. Huberman and R. Lukose. Social dilemmas and internet congestion. Science, 277:535, 1997.
|
| |
13
|
B. Huberman, P. Pirolli, J. Pitkow, and R. Lukose. Strong regularities in World Wide Web surfing. Science, 280(5360):95--97, 1998.
|
| |
14
|
R. Kumar , P. Raghavan , S. Rajagopalan , D. Sivakumar , A. Tomkins , E. Upfal, Stochastic models for the Web graph, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p.57, November 12-14, 2000
|
| |
15
|
L. Laura, S. Leonardi, S. Millozzi, U. Meyer, and J. Sibeyn. Algorithms and experiments for the Webgraph. In Proc. European Symposium on Algorithms, 2003.
|
| |
16
|
M. Newman. Analysis of weighted networks. Technical report, http://arxiv.org/abs/cond-mat/0407503, 2004.
|
| |
17
|
|
| |
18
|
S. Shalunov and B. Teitelbaum. Internet2 TCP use and performance. Technical report, Internet2 Technical Report, 2001.
|
CITED BY 5
|
|
|
|
|
Mark R. Meiss , Filippo Menczer , Santo Fortunato , Alessandro Flammini , Alessandro Vespignani, Ranking web sites with real user traffic, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
Katja Gilly , Carlos Quesada-Granja , Salvador Alcaraz , Carlos Juiz , Ramon Puigjaner, A Statistically Customisable Web Benchmarking Tool, Electronic Notes in Theoretical Computer Science (ENTCS), 232, p.89-99, March, 2009
|
|
|
Mark Meiss , John Duncan , Bruno Gonçalves , José J. Ramasco , Filippo Menczer, What's in a session: tracking individual behavior on the web, Proceedings of the 20th ACM conference on Hypertext and hypermedia, June 29-July 01, 2009, Torino, Italy
|
|