|
ABSTRACT
Recent research on Internet traffic classification algorithms has yield a flurry of proposed approaches for distinguishing types of traffic, but no systematic comparison of the various algorithms. This fragmented approach to traffic classification research leaves the operational community with no basis for consensus on what approach to use when, and how to interpret results. In this work we critically revisit traffic classification by conducting a thorough evaluation of three classification approaches, based on transport layer ports, host behavior, and flow features. A strength of our work is the broad range of data against which we test the three classification approaches: seven traces with payload collected in Japan, Korea, and the US. The diverse geographic locations, link characteristics and application traffic mix in these data allowed us to evaluate the approaches under a wide variety of conditions. We analyze the advantages and limitations of each approach, evaluate methods to overcome the limitations, and extract insights and recommendations for both the study and practical application of traffic classification. We make our software, classifiers, and data available for researchers interested in validating or extending this work.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
CoralReef. http://www.caida.org/tools/measurement/coralreef/.
|
| |
2
|
Ellacoya. http://www.ellacoya.com.
|
| |
3
|
Packeteer. http://www.packeteer.com.
|
| |
4
|
WEKA: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka/.
|
 |
5
|
|
 |
6
|
Annalisa Appice , Michelangelo Ceci , Simon Rawles , Peter Flach, Redundant feature elimination for multi-class problems, Proceedings of the twenty-first international conference on Machine learning, p.5, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015397]
|
| |
7
|
T. Auld, A. W. Moore, and S. F. Gull. Bayesian neural networks for internet traffic classification. IEEE Transactions on Neural Networks, 18(1): 223--239, January 2007.
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
T. Choi, C. Kim, S. Yoon, J. Park, B. Lee, H. Kim, and H. Chung. Content-aware internet application traffic measurement and analysis. In IEEE/IFIP NOMS, April 2004.
|
| |
13
|
K. Claffy, H.-W. Braun, and G. C. Polyzos. A parameterizable methodology for internet traffic flow profiling. IEEE JSAC Special Issue on the Global Internet, 1995.
|
 |
14
|
|
| |
15
|
Holger Dreger , Anja Feldmann , Michael Mai , Vern Paxson , Robin Sommer, Dynamic application-layer protocol analysis for network intrusion detection, Proceedings of the 15th conference on USENIX Security Symposium, July 31-August 04, 2006, Vancouver, B.C., Canada
|
 |
16
|
|
 |
17
|
Jeffrey Erman , Anirban Mahanti , Martin Arlitt , Carey Williamson, Identifying and discriminating between web and peer-to-peer traffic in the network core, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242692]
|
| |
18
|
J. Erman, A. Mahanti, and M. Arlitt. Internet Traffic Identification using Machine Learning. In IEEE Globecom, November 2006.
|
 |
19
|
|
| |
20
|
J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson. Offline/Realtime Traffic Classification Using Semi-Supervised Learning. In IFIP Performance, October 2007.
|
 |
21
|
|
 |
22
|
Patrick Haffner , Subhabrata Sen , Oliver Spatscheck , Dongmei Wang, ACAS: automated construction of application signatures, Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data, August 26-26, 2005, Philadelphia, Pennsylvania, USA
[doi> 10.1145/1080173.1080183]
|
 |
23
|
Marios Iliofotou , Prashanth Pappu , Michalis Faloutsos , Michael Mitzenmacher , Sumeet Singh , George Varghese, Network monitoring using traffic dispersion graphs (tdgs), Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, October 24-26, 2007, San Diego, California, USA
[doi> 10.1145/1298306.1298349]
|
 |
24
|
Thomas Karagiannis , Andre Broido , Michalis Faloutsos , Kc claffy, Transport layer identification of P2P traffic, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, October 25-27, 2004, Taormina, Sicily, Italy
[doi> 10.1145/1028788.1028804]
|
 |
25
|
Thomas Karagiannis , Konstantina Papagiannaki , Michalis Faloutsos, BLINC: multilevel traffic classification in the dark, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
 |
26
|
Anukool Lakhina , Mark Crovella , Christophe Diot, Mining anomalies using traffic feature distributions, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
| |
27
|
Z. Li, R. Yuan, and X. Guan. Accurate Classification of the Internet Traffic Based on the SVM Method. In ICC, June 2007.
|
 |
28
|
Justin Ma , Kirill Levchenko , Christian Kreibich , Stefan Savage , Geoffrey M. Voelker, Unexpected means of protocol inference, Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, October 25-27, 2006, Rio de Janeriro, Brazil
[doi> 10.1145/1177080.1177123]
|
| |
29
|
A. McGregor, M. Hall, P. Lorier, and J. Brunskill. Flow clustering using machine learning techniques. In PAM, April 2004.
|
| |
30
|
A. Moore and K. Papagiannaki. Toward the accurate identification of network applications. In PAM, April 2005.
|
 |
31
|
|
| |
32
|
T. T. Nguyen and G. Armitage. Training on multiple sub-flows to optimise the use of machine learning classifiers in real-world ip networks. In IEEE LCN, November 2006.
|
| |
33
|
T. T. Nguyen and G. Armitage. A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys and Tutorials, to appear, 2008.
|
| |
34
|
|
| |
35
|
J. C. Plat. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, April 1998.
|
 |
36
|
Matthew Roughan , Subhabrata Sen , Oliver Spatscheck , Nick Duffield, Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, October 25-27, 2004, Taormina, Sicily, Italy
[doi> 10.1145/1028788.1028805]
|
 |
37
|
Subhabrata Sen , Oliver Spatscheck , Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988742]
|
| |
38
|
N. Williams, S. Zander, and G. Armitage. Evaluating machine learning algorithms for automated network application identification. Technical Report 060401B, CAIA, Swinburne Univ., April 2006.
|
 |
39
|
|
| |
40
|
|
| |
41
|
Y. J. Won, B.-C. Park, H.-T. Ju, M.-S. Kim, and J. W. Hong. A hybrid approach for accurate application traffic idenficiation. In IEEE/IFIP E2EMON, April 2006.
|
| |
42
|
|
|