ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Data streaming algorithms for efficient and accurate estimation of flow size distribution
Full text PdfPdf (405 KB)
Source Joint International Conference on Measurement and Modeling of Computer Systems archive
Proceedings of the joint international conference on Measurement and modeling of computer systems table of contents
New York, NY, USA
SESSION: Simulation tools and analysis techniques table of contents
Pages: 177 - 188  
Year of Publication: 2004
ISBN:1-58113-873-3
Also published in ...
Authors
Abhishek Kumar  Georgia Institute of Technology
Minho Sung  Georgia Institute of Technology
Jun (Jim) Xu  Georgia Institute of Technology
Jia Wang  AT&T Labs -- Research
Sponsors
SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 66,   Citation Count: 22
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1005686.1005709
What is a DOI?

Warning: The download time has expired please click on the item to try again.


ABSTRACT

Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
6
7
8
9
10
 
11
S. Muthukrishnan, "Data streams: Algorithms and applications," available at http://athos.rutgers.edu/~muthu/.
12
 
13
14
 
15
A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the em algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1--38, 1977.
 
16
A. Kumar, J. Xu, J. Wang, O. Spatschek, and L. Li, "Space-Code Bloom Filter for Efficient per-flow Traffic Measurement," in Proc. IEEE Infocom, Mar. 2004.
17
 
18
D. Moor, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver, "The spread of the sapphire/slammer worm," in Technical Report, CAIDA, 2003.
19
20
 
21

CITED BY  22

Collaborative Colleagues:
Abhishek Kumar: colleagues
Minho Sung: colleagues
Jun (Jim) Xu: colleagues
Jia Wang: colleagues