ACM Home Page
Please provide us with feedback. Feedback
Confident estimation for multistage measurement sampling and aggregation
Full text PdfPdf (559 KB)
Source
Joint International Conference on Measurement and Modeling of Computer Systems archive
Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems table of contents
Annapolis, MD, USA
SESSION: Measurements table of contents
Pages 109-120  
Year of Publication: 2008
ISBN:978-1-60558-005-0
Also published in ...
Authors
Edith Cohen  AT&T Labs-Research, Florham Park, NJ, USA
Nick Duffield  AT&T Labs-Research, Florham Park, NJ, USA
Carsten Lund  AT&T Labs-Research, Florham Park, NJ, USA
Mikkel Thorup  AT&T Labs-Research, Florham Park, NJ, USA
Sponsors
SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 137,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1375457.1375471
What is a DOI?

ABSTRACT

Measurement, collection, and interpretation of network usage data commonly involves multiple stage of sampling and aggregation. Examples include sampling packets, aggregating them into flow statistics at a router, sampling and aggregation of usage records in a network data repository for reporting, query and archiving. Although unbiased estimates of packet, bytes and flows usage can be formed for each sampling operation, for many applications it is crucial to know the inherent estimation error. Previous work in this area has been limited mainly to analyzing the estimator variance for particular methods, e.g., independent packet sampling. However, the variance is of limited use for more general sampling methods, where the estimate may not be well approximated by a Gaussian distribution.

This motivates our paper, in which we establish Chernoff bounds on the likelihood of estimation error in a general multistage combination of measurement sampling and aggregation. We derive the scale against which errors are measured, in terms of the constituent sampling and aggregation operations. In particular this enables us to obtain rigorous confidence intervals around any given estimate. We apply our method to a number of sampling schemes both in the literature and currently deployed, including sampling of packet sampled NetFlow records, Sample and Hold, and Flow Slicing. We obtain one particularly striking result in the first case: that for a range of parameterizations, packet sampling has no additional impact on the estimator confidence derived from our bound, beyond that already imposed by flow sampling.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
Cisco. White paper - netflow services and applications. http://www.cisco.com/warp/public/cc/pd/iosw/ioft/neflct/tech/napps_wp.htm.
4
5
6
 
7
N. Duffield. (Ed.) A Framework for Packet Selection and Reporting. Internet Draft, June 2007. Work in Progress.
 
8
N. Duffield and M. Grossglauser. Trajectory Sampling with Unreliable Reporting. In INFOCOM, 2004.
9
10
11
 
12
 
13
N. Duffield, C. Lund, and M. Thorup. Optimal combination of sampled network measurements. In Proc. 5th ACM SIGCOMM Internet Measurement Workshop (IMC), page to appear, 2005.
 
14
N. Duffield, C. Lund, and M. Thorup. Sampling to estimate arbitrary subset sums. Technical Report cs.DS/0509026, Computing Research Repository (CoRR), 2005. http://arxiv.org/abs/cs.DS/0509026. Preliminary journal version of {11}.
 
15
16
17
18
 
19
D. Horvitz and D. Thompson. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc., 47, 663--685 1952.
 
20
J. Jedwab, P. Phaal, and B. Pinna. Traffic estimation for the largest sources in a network, using packet sampling with limited storage. Technical Report HPL-92-3, HP Laboratories, Bristol, March 1992. url: http://www.hpl.hp.com/techreports/92/HPL-92-35.html.
21
22
 
23
R. R. Kompella and C. Estan. The power of slicing in internet flow measurement. In Proc. ACM Internet Measurement Conference, Berkeley, CA, October 2005.
24
 
25
 
26
J. Reeves and S. Panchen. Traffic monitoring with packet-based sampling for defense against security threats. In Proceedings of Passive and Active Measurement Workshop (PAM 2002), Fort Collins, CO, USA, March 25-26 2002.
27
 
28
M. Szegedy and M. Thorup. On the variance of subset sum estimation. Technical Report cs.DS/0702029, Computing Research Repository (CoRR), 2007. http://arxiv.org/abs/cs.DS/0702029.
29
 
30
J. Turian and I. D. Melamed. Computational challenges in parsing by classification. In HLT-NAACL Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, New York, NY, 2006.
 
31
T. Zseby. Deployment of sampling methods for sla validation with non-intrusive measurements. In Proceedings of Passive and Active Measurement Workshop (PAM 2002), Fort Collins, CO, USA, March 25-26 2002.

Collaborative Colleagues:
Edith Cohen: colleagues
Nick Duffield: colleagues
Carsten Lund: colleagues
Mikkel Thorup: colleagues