|
ABSTRACT
Packet sampling methods such as Cisco's NetFlow are widely employed by large networks to reduce the amount of traffic data measured. A key problem with packet sampling is that it is inherently a lossy process, discarding (potentially useful) information. In this paper, we empirically evaluate the impact of sampling on anomaly detection metrics. Starting with unsampled flow records collected during the Blaster worm outbreak, we reconstruct the underlying packet trace and simulate packet sampling at increasing rates. We then use our knowledge of the Blaster anomaly to build a baseline of normal traffic (without Blaster), against which we can measure the anomaly size at various sampling rates. This approach allows us to evaluate the impact of packet sampling on anomaly detection without being restricted to (or biased by) a particular anomaly detection method.We find that packet sampling does not disturb the anomaly size when measured in volume metrics such as the number of bytes and number of packets, but grossly biases the number of flows. However, we find that recently proposed entropy-based summarizations of packet and flow counts are affected less by sampling, and expose the Blaster worm outbreak even at higher sampling rates. Our findings suggest that entropy summarizations are more resilient to sampling than volume metrics. Thus, while not perfect, sampling still preserves sufficient distributional structure, which when harnessed by tools like entropy, can expose hard-to-detect scanning anomalies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
Choi, B.-Y., Park, J., and Zhang, Z.-L. Adaptive random sampling for total load estimation. In IEEE International Conference on Communications (2003).
|
| |
4
|
Cisco NetFlow. At www.cisco.com/warp/public/732/Tech/netflow/.
|
 |
5
|
|
 |
6
|
Nick Duffield , Carsten Lund , Mikkel Thorup, Estimating flow distributions from sampled flow statistics, Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, August 25-29, 2003, Karlsruhe, Germany
[doi> 10.1145/863955.863992]
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
Jung, J., Paxson, V., Berger, A., and Balakrishnan, H. Fast portscan detection using sequential hypothesis testing. In Proceedings of the IEEE Symposium on Security and Privacy (2004).
|
| |
11
|
Kim, M.-S., Kang, H.-J., Hung, S.-C., Chung, S.-H., and Hong, J. W. A Flow-based Method for Abnormal Network Traffic Detection. IEEE/IFIP Network Operations and Management Symposium (Seoul, 2004).
|
 |
12
|
Anukool Lakhina , Mark Crovella , Christophe Diot, Diagnosing network-wide traffic anomalies, Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, August 30-September 03, 2004, Portland, Oregon, USA
|
 |
13
|
Anukool Lakhina , Mark Crovella , Christophe Diot, Mining anomalies using traffic feature distributions, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
 |
14
|
Jianning Mai , Chen-Nee Chuah , Ashwin Sridharan , Tao Ye , Hui Zang, Is sampled data sufficient for anomaly detection?, Proceedings of the 6th ACM SIGCOMM on Internet measurement, October 25-27, 2006, Rio de Janeriro, Brazil
[doi> 10.1145/1177080.1177102]
|
| |
15
|
Mai, J., Sridharan, A., Chuah, C.-N., Zang, H., and Ye, T. Impact of packet sampling on portscan detection. IEEE Journal on Selected Areas in Communication (2006).
|
| |
16
|
Müller, O., Graf, D., Oppermann, A., and Weibel, H. Swiss internet analysis, 2004. http://www.swiss-internet-analysis.org/.
|
| |
17
|
Sridharan, A., Ye, T., and Bhattacharrya, S. Connectionless port scan detection on the backbone. Malware workshop, held in conjunction with IPCCC (Phoenix, AZ, April 2006).
|
| |
18
|
SWITCH. Swiss academic and research network. http://www.switch.ch/, 2006.
|
| |
19
|
|
 |
20
|
|
 |
21
|
Kuai Xu , Zhi-Li Zhang , Supratik Bhattacharyya, Profiling internet backbone traffic: behavior models and applications, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
CITED BY 10
|
|
Augustin Soule , Fernando Silveira , Haakon Ringberg , Christophe Diot, Challenging the supremacy of traffic matrices in anomaly detection, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, October 24-26, 2007, San Diego, California, USA
|
|
|
Haiquan (Chuck) Zhao , Ashwin Lall , Mitsunori Ogihara , Oliver Spatscheck , Jia Wang , Jun Xu, A data streaming algorithm for estimating entropies of od flows, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, October 24-26, 2007, San Diego, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Martin Burkhart , Daniela Brauckhoff , Martin May , Elisa Boschi, The risk-utility tradeoff for IP address truncation, Proceedings of the 1st ACM workshop on Network data anonymization, October 31-31, 2008, Alexandria, Virginia, USA
|
|
|
George Nychis , Vyas Sekar , David G. Andersen , Hyong Kim , Hui Zhang, An empirical evaluation of entropy-based traffic anomaly detection, Proceedings of the 8th ACM SIGCOMM conference on Internet measurement, October 20-22, 2008, Vouliagmeni, Greece
|
|
|
Patrick Loiseau , Paulo Gonçalves , Stéphane Girard , Florence Forbes , Pascale Vicat-Blanc Primet, Maximum likelihood estimation of the flow size distribution tail index from sampled packet data, Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, June 15-19, 2009, Seattle, WA, USA
|
|