ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
A pragmatic approach to dealing with high-variability in network measurements
Full text PdfPdf (743 KB)
Source Internet Measurement Conference archive
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement table of contents
Taormina, Sicily, Italy
SESSION: Traffic generation and analysis table of contents
Pages: 88 - 100  
Year of Publication: 2004
ISBN:1-58113-821-0
Authors
Walter Willinger  AT&T Labs Research
David Alderson  California Institute of Technology
Lun Li  California Institute of Technology
Sponsors
SIGCOMM: ACM Special Interest Group on Data Communication
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 30,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1028788.1028800
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

The Internet is teeming with high variability phenomena, from measured IP flow sizes to aspects of inferred router-level connectivity, but there still exists considerable debate about how best to deal with this encountered high variability and model it. While one popular approach favors modeling highly variable event sizes with conventional, finite variance distributions such as lognormal or Weibull distributions, Mandelbrot has argued for the last 40 years that there are compelling mathematical, statistical, and practical reasons for why infinite variance distributions are natural candidates for capturing the essence behind high variability phenomena. In this paper, we elaborate on Mandelbrot's arguments and present a methodology that often allows for a clear distinction between the two approaches. In particular, by requiring the resulting models to be resilient to ambiguities (i.e., robust to real-world deficiencies in the underlying network measurements) and internally self-consistent (i.e., insensitive with respect the duration, location, or time of the data collection), we provide a rigorous framework for a qualitative assessment of the observed high variability. We apply the proposed framework to assess previously reported findings about measured Internet traffic and inferred router- and AS-level connectivity. In the process, we also discuss what our approach has to say about recent discussions concerning network traffic being Poisson or self-similar and router-level or AS-level connectivity graphs of the Internet being scale-free or not.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Albert, H. Jeong, and A.-L. Barabási. Attack and error tolerance of complex networks, Nature 406, pp. 378--382, 2000.
 
2
D. Alderson. Technological and Economic Drivers and Constraints in the Internet's "Last Mile", Technical Report CIT-CDS-04-004, California Institute of Technology, 2004.
 
3
A. Bookstein Informetric distributions, Part II: Resilience to ambiguity Journal of the Amer. Soc. for Information Science, Vol. 41, No. 5, pp. 376--386, 1990.
 
4
 
5
J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun. Internet traffic tends toward Poisson and independent as the load increases. in: Nonlinear Estimation and Classification, C. Holmes, D. Denison, M. Hansen, B. Yu, and B. Mallick (Eds.), Springer-Verlag, New York, 2002.
6
 
7
Q. Chen, H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger. The Origin of power laws in Internet topologies revisited. Proc. IEEE INFOCOM '02, New York, NY, 2002.
 
8
K. Claffy, H.-W. Braun, and G. Polyzos. A parameterizable methodology for Internet traffic flow profiling, IEEE Journal on Selected Areas in Communications 13, pp. 1481--1494, 1995.
 
9
Cooperative Association for Internet Data Analysis (CAIDA), Skitter. Available from www.caida.org/tools/measurement/skitter/.
 
10
A. B. Downey. The structural cause for file size distribution. www.allendowney.com/research/filesize/cds-tr25 2000.ps.gz, 2000.
11
 
12
A. B. Downey. Lognormal and Pareto distributions in the Internet. www.allendowney.com/research/longtail/downey03lognormal.pdf, 2003.
 
13
D. Draper, J. S. Hodges, C. L. Mallows and D. Pregibon. Exchangeability and data analysis, J. R. Statist. Soc. A 156, pp. 9--37, 1993.
14
15
 
16
A. Feldmann, Characteristics of TCP connection arrivals, in: Self-Similar Network Traffic and Performance Evaluation, K. Park and W. Willinger (Eds.), J. Wiley & Sons, New York, pp. 367--400, 2000.
 
17
W. Feller. An Introduction to Probability Theory and its Applications, Volume II, Second Edition. Wiley & Sons, New York, 1966.
 
18
 
19
 
20
 
21
R. Govindan and H. Tangmunarunkit. Heuristics for Internet Map Discovery, Proceeding of IEEE INFOCOM '00, 2000.
 
22
N.L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions, Volume1, Second Edition J. Wiley & Sons, New York, 1994.
 
23
T. Karagiannis, M. Faloutsos, M. Molle, and A. Broido. A nonstationary Poisson view of Internet traffic. Proc. IEEE INFOCOM '04, 2004.
 
24
T. G. Kurtz. Limit Theorems for Workload Input Models, in: Stochastic Networks: Theory and Applications, F.P. Kelly, S. Zachary, and I. Ziedins (Eds.), pp. 119--140, Oxford University Press, 1996.
 
25
A. Lakhina, J. W. Byers, M. Crovella, and P. Xie. Sampling biases in IP topology measurements. Proc. IEEE INFOCOM '03, 2003.
26
 
27
B. B. Mandelbrot. New methods in statistical economics, Journal of Political Economics 71, pp. 421--440, 1963.
 
28
B. B. Mandelbrot. Fractals and Scaling in Finance. Springer-Verlag, New York, 1997.
 
29
M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, Vol. 1, No. 2, pp. 226--251, 2004.
 
30
P. Newman, T. Lyons, and G. Minshall. Flow labelled IP: A connectionless approach to ATM, Proc. IEEE INFOCOM '96, 1996.
31
 
32
Passive Measurement and Analysis (PMA) Project, http://pma.nlanr.net/
 
33
 
34
S. I. Resnick. Heavy tail modeling and teletraffic data. Annals of Statistics 25, pp. 1805--1869, 1997.
 
35
Route Views, University of Oregon Route Views Project, Available at http://www.antc.uoregon.edu/route-views/.
36
 
37
G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York, 1994.
 
38
39
40
 
41
J. W. Tukey. Data analysis and behavioral science or learning to bear the quantitative's man burden by shunning badmandments, in: The Collected Works of John W. Tukey, L. W. Jones (Ed.), Vol. III, Wadsworth, Monterey, CA, 1986.
 
42
A. Veres and M. Boda. The chaotic nature of TCP congestion control. Proc. IEEE INFOCOM '00, 2000.
 
43
W. Willinger, R. Govindan, S. Jamin, V. Paxson and S. Shenker. Scaling phenomena in the Internet: Critically examining criticality. Proc. Nat. Acad. Sci. 99, suppl. 1, pp. 2573--2580, 2002.
 
44
W. Willinger and V. Paxson, Where Mathematics meets the Internet, Notices of the AMS 45, pp. 961--970, 1998.
 
45
S.-H. Yook, H. Jeong, and A.-L. Barabási. Modeling the Internet's large-scale topology. Proc. Nat. Acad. Sci. 99, pp. 13382--13386, 2002.
 
46
X. Zhu, J. Yu, and J. C. Doyle. Heavy tails, generalized coding, and optimal Web layout. Proc. IEEE INFOCOM '01, 2001.

CITED BY  8

Collaborative Colleagues:
Walter Willinger: colleagues
David Alderson: colleagues
Lun Li: colleagues