|
ABSTRACT
Sampling is a popular method for improving the scalability of analyzing massive datasets such as network traffic traces, webclick traffic and other forms of transaction data. However, in some cases, existing simple sampling strategies fail to capture the underlying distribution of the data. In particular, for network traffic, sampling is influenced by heavy traffic from flash crowds and Denial of Service (DoS) attacks. In such cases, it reveals little information about the other smaller traffic patterns which may contain interesting yet important information about the traffic. We propose an adaptive sampling technique that utilizes a buffer of frequently seen patterns and a combination of sampling steps to build a hierarchical tree of traffic clusters. We show that this sampling technique ensures that smaller and newer patterns are represented in the cluster tree while satisfying the maximum sampling rate imposed by the resource constraints. This technique has two benefits: it preserves the underlying patterns of the data, and improves efficiency by reducing the sampling of records from known patterns. Through an empirical evaluation on a benchmark dataset, we demonstrate the accuracy of our system in detecting certain types of rare attacks that are otherwise not detected by systematic sampling. We also demonstrate the efficiency of our system in terms of reducing the number of sampled records in detecting frequent patterns.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Cristian Estan , Stefan Savage , George Varghese, Automatically inferring patterns of resource consumption in network traffic, Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, August 25-29, 2003, Karlsruhe, Germany
[doi> 10.1145/863955.863972]
|
| |
2
|
Mahmood, A. N., C. Leckie, and P. Udaya. Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis. In Networking. 2006. pp. 1092--1098.
|
 |
3
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
 |
4
|
|
 |
5
|
|
| |
6
|
Graham Cormode , Flip Korn , S. Muthukrishnan , Divesh Srivastava, Finding hierarchical heavy hitters in data streams, Proceedings of the 29th international conference on Very large data bases, p.464-475, September 09-12, 2003, Berlin, Germany
|
 |
7
|
|
 |
8
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
| |
9
|
|
 |
10
|
Kuai Xu , Zhi-Li Zhang , Supratik Bhattacharyya, Profiling internet backbone traffic: behavior models and applications, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
| |
11
|
|
| |
12
|
Gonzalez, J. and V. Paxson. Enhancing Network Intrusion Detection with Integrated Sampling and Filtering. In Proceedings of RAID. 2006. pp. 272--289.
|
| |
13
|
Anja Feldmann , Albert Greenberg , Carsten Lund , Nick Reingold , Jennifer Rexford , Fred True, Deriving traffic demands for operational IP networks: methodology and experience, IEEE/ACM Transactions on Networking (TON), v.9 n.3, p.265-280, June 2001
[doi> 10.1109/90.929850]
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
Thompson, S., Sampling. New York. 1992: John Wiley and Sons.
|
| |
19
|
Thompson, S. and G. Seber, Adaptive Sampling. 1996, New York: John Wiley and Sons.
|
| |
20
|
Mahoney, M. and P. Chan. An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection. In Proceedings of RAID. 2003. pp. 220--237.
|
| |
21
|
MIT Lincoln Lab DARPA Intrusion Detection Datasets. http://www.ll.mit.edu/IST/ideval/data/data_index.htm
|
| |
22
|
Kendall, K., A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems. 1999, Massachusetts Institute Of Technology.
|
| |
23
|
MIT Lincoln Lab 1998 DARPA Intrusion Detection Dataset. http://www.ll.mit.edu/IST/ideval/data/1998/1998_data_index.htm
|
| |
24
|
Cochran, W., Sampling techniques. New York, 1977.
|
| |
25
|
Krishnaiah, P. and C. Rao, Handbook of Statistics 6: Sampling. 1988: North-Holland.
|
|