|
ABSTRACT
Learning models to classify rarely occurring target classes is an important problem with applications in network intrusion detection, fraud detection, or deviation detection in general. In this paper, we analyze our previously proposed two-phase rule induction method in the context of learning complete and precise signatures of rare classes. The key feature of our method is that it separately conquers the objectives of achieving high recall and high precision for the given target class. The first phase of the method aims for high recall by inducing rules with high support and a reasonable level of accuracy. The second phase then tries to improve the precision by learning rules to remove false positives in the collection of the records covered by the first phase rules. Existing sequential covering techniques try to achieve high precision for each individual disjunct learned. In this paper, we claim that such approach is inadequate for rare classes, because of two problems: splintered false positives and error-prone small disjuncts. Motivated by the strengths of our two-phase design, we design various synthetic data models to identify and analyze the situations in which two state-of-the-art methods, RIPPER and C4.5 rules, either fail to learn a model or learn a very poor model. In all these situations, our two-phase approach learns a model with significantly better recall and precision levels. We also present a comparison of the three methods on a challenging real-life network intrusion detection dataset. Our method is significantly better or comparable to the best competitor in terms of achieving better balance between recall and precision.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] R. C. Agarwal and M. V. Joshi. PNrule: A new framework for learning classifier models in data mining (a case study in network intrusion detection). In Proceedings of First SIAM Conference on Data Mining, Chicago, April 2001. Expanded version available as IBM Research Division Report, RC 21719, April 2000.
|
| |
2
|
|
| |
3
|
|
| |
4
|
William W. Cohen , Yoram Singer, A simple, fast, and effective rule learner, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.335-342, July 18-22, 1999, Orlando, Florida, United States
|
| |
5
|
[5] W. W. Cohen. Fast effective rule induction. In Proc. of Twelfth International Conference on Machine Learning, Lake Tahoe, California, 1995.
|
| |
6
|
[6] A. Danyluk and F. Provost, Small disjuncts in action: Learning to diagnose errors in the local loop of the telephone network. In Proc. of Tenth International Conference on Machine Learning, pages 81-88. Morgan Kaufmann, 1993.
|
| |
7
|
[7] C. Elkan. Results of the KDD'99 classifier learning contest. In http:///www-cse.ucsd.edu/~elkan/clresults.html, September 1999.
|
| |
8
|
[8] R. C. Holte, L. Acker, and B. Porter. Concept learning and the problem of small disjuncts. In Proc. of Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), Pages 813-818, 1989.
|
| |
9
|
[9] R. Michalski, I. Mozetic, J. Hong, and N. Lavrac. The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proc. of Fifth National Conference on AI (AAAI-86). pages 1041-1045, Philadelphia, 1986.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
[15] G. M. Weiss. Learning with rare cases and small disjuncts. In Proc. of Twelfth International Conference on Machine Learning, pages 558-565, Lake Tahoe, California, 1995.
|
| |
16
|
|
CITED BY 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Murali Haran , Alan Karr , Michael Last , Alessandro Orso , Adam A. Porter , Ashish Sanil , Sandro Fouche, Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks, IEEE Transactions on Software Engineering, v.33 n.5, p.287-304, May 2007
|
|
|
Junjie Wu , Hui Xiong , Peng Wu , Jian Chen, Local decomposition for rare class analysis, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|