|
ABSTRACT
Measurement, collection, and interpretation of network usage data commonly involves multiple stage of sampling and aggregation. Examples include sampling packets, aggregating them into flow statistics at a router, sampling and aggregation of usage records in a network data repository for reporting, query and archiving. Although unbiased estimates of packet, bytes and flows usage can be formed for each sampling operation, for many applications it is crucial to know the inherent estimation error. Previous work in this area has been limited mainly to analyzing the estimator variance for particular methods, e.g., independent packet sampling. However, the variance is of limited use for more general sampling methods, where the estimate may not be well approximated by a Gaussian distribution. This motivates our paper, in which we establish Chernoff bounds on the likelihood of estimation error in a general multistage combination of measurement sampling and aggregation. We derive the scale against which errors are measured, in terms of the constituent sampling and aggregation operations. In particular this enables us to obtain rigorous confidence intervals around any given estimate. We apply our method to a number of sampling schemes both in the literature and currently deployed, including sampling of packet sampled NetFlow records, Sample and Hold, and Flow Slicing. We obtain one particularly striking result in the first case: that for a range of parameterizations, packet sampling has no additional impact on the estimator confidence derived from our bound, beyond that already imposed by flow sampling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Daniela Brauckhoff , Bernhard Tellenbach , Arno Wagner , Martin May , Anukool Lakhina, Impact of packet sampling on anomaly detection metrics, Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, October 25-27, 2006, Rio de Janeriro, Brazil
[doi> 10.1145/1177080.1177101]
|
| |
3
|
Cisco. White paper - netflow services and applications. http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm.
|
 |
4
|
|
 |
5
|
Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup, Sketching unaggregated data streams for subpopulation-size queries, Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 11-13, 2007, Beijing, China
[doi> 10.1145/1265530.1265566]
|
 |
6
|
|
| |
7
|
N. Duffield. (Ed.) A Framework for Packet Selection and Reporting. Internet Draft, June 2007. Work in Progress.
|
| |
8
|
N. Duffield and M. Grossglauser. Trajectory Sampling with Unreliable Reporting. In INFOCOM, 2004.
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
N. Duffield, C. Lund, and M. Thorup. Optimal combination of sampled network measurements. In Proc. 5th ACM SIGCOMM Internet Measurement Workshop (IMC), page to appear, 2005.
|
| |
14
|
N. Duffield, C. Lund, and M. Thorup. Sampling to estimate arbitrary subset sums. Technical Report cs.DS/0509026, Computing Research Repository (CoRR), 2005. http://arxiv.org/abs/cs.DS/0509026. Preliminary journal version of {11}.
|
| |
15
|
|
 |
16
|
Cristian Estan , Ken Keys , David Moore , George Varghese, Building a better NetFlow, Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, August 30-September 03, 2004, Portland, Oregon, USA
|
 |
17
|
Cristian Estan , George Varghese, New directions in traffic measurement and accounting, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
 |
18
|
|
| |
19
|
D. Horvitz and D. Thompson. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc., 47, 663--685 1952.
|
| |
20
|
J. Jedwab, P. Phaal, and B. Pinna. Traffic estimation for the largest sources in a network, using packet sampling with limited storage. Technical Report HPL-92-3, HP Laboratories, Bristol, March 1992. url: http://www.hpl.hp.com/techreports/92/HPL-92-35.html.
|
 |
21
|
|
 |
22
|
|
| |
23
|
R. R. Kompella and C. Estan. The power of slicing in internet flow measurement. In Proc. ACM Internet Measurement Conference, Berkeley, CA, October 2005.
|
 |
24
|
Jianning Mai , Chen-Nee Chuah , Ashwin Sridharan , Tao Ye , Hui Zang, Is sampled data sufficient for anomaly detection?, Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, October 25-27, 2006, Rio de Janeriro, Brazil
[doi> 10.1145/1177080.1177102]
|
| |
25
|
|
| |
26
|
J. Reeves and S. Panchen. Traffic monitoring with packet-based sampling for defense against security threats. In Proceedings of Passive and Active Measurement Workshop (PAM 2002), Fort Collins, CO, USA, March 25-26 2002.
|
 |
27
|
|
| |
28
|
M. Szegedy and M. Thorup. On the variance of subset sum estimation. Technical Report cs.DS/0702029, Computing Research Repository (CoRR), 2007. http://arxiv.org/abs/cs.DS/0702029.
|
 |
29
|
|
| |
30
|
J. Turian and I. D. Melamed. Computational challenges in parsing by classification. In HLT-NAACL Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, New York, NY, 2006.
|
| |
31
|
T. Zseby. Deployment of sampling methods for sla validation with non-intrusive measurements. In Proceedings of Passive and Active Measurement Workshop (PAM 2002), Fort Collins, CO, USA, March 25-26 2002.
|
|