| Semi-supervised co-training and active learning based approach for multi-view intrusion detection |
| Full text |
Pdf
(370 KB)
|
Source
|
Symposium on Applied Computing
archive
Proceedings of the 2009 ACM symposium on Applied Computing
table of contents
Honolulu, Hawaii
SESSION: Computer security track
table of contents
Pages 2042-2048
Year of Publication: 2009
ISBN:978-1-60558-166-8
|
|
Authors
|
|
Ching-Hao Mao
|
National Taiwan University of Science and Technology, Taipei, Taiwan
|
|
Hahn-Ming Lee
|
National Taiwan University of Science and Technology, Taipei, Taiwan and Academia Sinica, Taipei, Taiwan
|
|
Devi Parikh
|
Carnegie Mellon University, Pittsburgh, Pennsylvania
|
|
Tsuhan Chen
|
Carnegie Mellon University, Pittsburgh, Pennsylvania
|
|
Si-Yu Huang
|
National Taiwan University of Science and Technology, Taipei, Taiwan
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 26, Downloads (12 Months): 102, Citation Count: 0
|
|
|
ABSTRACT
Although there is immense data available from networks and hosts, a very small proportion of this data is labeled due to the cost of obtaining expert labels. This proves to be a significant bottle-neck for developing supervised intrusion detection systems that rely solely on labeled data. In spite of the data being collected from real network environments and hence potentially holding valuable information for intrusion detection, such systems can not exploit the remaining unlabeled data. In this work, we intelligently leverage both labeled and unlabeled data. Also, intrusion detection tasks naturally lend themselves into a multi-view scenario, and can benefit significantly if these multiple views are combined meaningfully. In this paper, we propose a co-training method framework for intrusion detection, which is a semi-supervised learning method and can not only utilize unlabeled data, but can also combine multi-view data. We also employ an active learning framework where statistically ambiguous parts of the unlabeled data are identified, which can then be labeled by an expert. This allows for minimal expert labeling while ensuring that the labels obtained from the expert are most informative. In our experiments, we demonstrate that leveraging the unlabeled data using our proposed method significantly reduces the error rate as compared to using the labeled data alone. In addition, our proposed multi-view method has a lower error rate than using a single view.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
Jeffrey Erman , Anirban Mahanti , Martin Arlitt , Ira Cohen , Carey Williamson, Offline/realtime traffic classification using semi-supervised learning, Performance Evaluation, v.64 n.9-12, p.1194-1213, October, 2007
[doi> 10.1016/j.peva.2007.06.014]
|
| |
4
|
|
| |
5
|
|
| |
6
|
Kayacik H. G., Zincir-Heywood A. N., and Heywood M. I. Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark. In Proceedings of the International Conference on Privacy, Security, and Trust (PST 2005) (Markham, Ontario, Canada, Oct. 12-14), Association for Computer Machinery Press, Morristown, NJ, 2006, 85--89.
|
| |
7
|
|
| |
8
|
|
| |
9
|
Lane, T. A Decision-Theoretic, Semi-Supervised Model for Intrusion Detection. Lane, T. In Maloof, M., ed., Machine learning and data mining for computer security: Methods and applications. London: Springer-Verlag. 2006.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Nigam, K., McCallum, A., and Mitchell, T. Semi-supervised Text Classification Using EM. In Chapelle, O., Zien, A., and Scholkopf, B. (Eds.) Semi-Supervised Learning. MIT Press: Boston, 2006.
|
| |
14
|
|
| |
15
|
Parikh, D., and Chen, T. Bringing Diverse Classifiers to Common Grounds: dtransform. In proceeding of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Las Vegas, Nevada, U.S.A, March 30-April 4), IEEE Computer Society Press, Los Alamitos, California, 2008, 3349--3352.
|
| |
16
|
Strokes, W. J., and Platt, C. J. Aladin: Active Learning for Statistical Intrusion Detection. In Proceeding of Neural Information Process System Conference 2007 Workshop on Machine Learning in Adversarial Environments for Computer Security MIT Press, Vancouver, Canada, 2007, 12--13.
|
| |
17
|
|
| |
18
|
University of California Department of Information and Computer Science, KDD Cup 99 Intrusion Detection Dataset Task Description, 1999, URL: http://kdd.ics.uci.edu-/databases/kddcup99/kddcup99.html.
|
| |
19
|
Xiaojin, Z. Semi-supervised Learning Literature Survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2005.
|
| |
20
|
Zissman, M. 1998/99 DARPA Intrusion Detection Evaluation datasets. MIT Lincoln Laboratory, URL: http://www.ll.mit.edu/IST/ideval/data/data_index.html.
|
|