| Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner |
| Full text |
Pdf
(563 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
San Francisco, California
Pages: 389 - 394
Year of Publication: 2001
ISBN:1-58113-391-X
|
|
Authors
|
|
Kenji Yamanishi
|
NEC Corporation, 4-1-1,Miyazaki,Miyamae, Kawasaki,Kanagawa 216-8555,Japan
|
|
Jun-ichi Takeuchi
|
NEC Corporation, 4-1-1,Miyazaki,Miyamae, Kawasaki,Kanagawa 216-8555,Japan
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 85, Citation Count: 13
|
|
|
ABSTRACT
This paper is concerned with the problem of detecting outliers from unlabeled data. In prior work we have developed SmartSifter, which is an on-line outlier detection algorithm based on unsupervised learning from data. On the basis of SmartSifter this paper yields a new framework for outlier filtering using both supervised and unsupervised learning techniques iteratively in order to make the detection process more effective and more understandable. The outline of the framework is as follows: In the first round, for an initial dataset, we run SmartSifter to give each data a score, with a high score indicating a high possibility of being an outlier. Next, giving positive labels to a number of higher scored data and negative labels to a number of lower scored data, we create labeled examples. Then we construct an outlier filtering rule by supervised learning from them. Here the rule is generated based on the principle of minimizing extended stochastic complexity. In the second round, for a new dataset, we filter the data using the constructed rule, then among the filtered data, we run SmartSifter again to evaluate the data in order to update the filtering rule. Applying of our framework to the network intrusion detection, we demonstrate that 1) it can significantly improve the accuracy of SmartSifter, and 2) outlier filtering rules can help the user to discover a general pattern of an outlier group.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, 1994.
|
 |
2
|
F. Bonchi , F. Giannotti , G. Mainetto , D. Pedreschi, A classification-based methodology for planning audit strategies in fraud detection, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.175-184, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312224]
|
| |
3
|
P. Burge and J. Shawe-Taylor, Detecting cellular fraud using adaptive prototypes, in Proc. of AI Approaches to Fraud Detection and Risk Management, pp:9-13, 1997.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
W. Lee, S. J. Stolfo, and K. W. Mok, Mining audit data to build intrusion detection models, in Proc. of KDD-98, 1998.
|
 |
11
|
|
| |
12
|
Y. Moreau and J. Vandewalle, Detection of mobile phone fraud using supervised neural networks: a first prototype, Available via: ftp://ftp.esat.kuleuven.ac.jp/pub/SISTA/ moreau/reports/icann97_TR97-44.ps.
|
| |
13
|
|
| |
14
|
J. Rissanen, Fisher information and stochastic complexity, IEEE Trans. Inf. Theory, IT-42, 1, pp. 40-47 (1996).
|
| |
15
|
R. M. Neal and G. E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, ftp://ftp.cs.toronto.edu/pub/radford/www/publications.html 1993.
|
| |
16
|
|
 |
17
|
Saharon Rosset , Uzi Murad , Einat Neumann , Yizhak Idan , Gadi Pinkas, Discovery of fraud rules for telecommunications—challenges and solutions, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.409-413, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312303]
|
| |
18
|
J.Takeuchi and K.Yamanishi, Empirical evaluation of an outlier detection engine SmartSifter, in Proc. of Symposium on Information and Its Applications (in Japanese), 2000.
|
| |
19
|
|
| |
20
|
K. Yamanishi, A decision-theoretic extension of stochastic complexity and its application to learning, IEEE Trans. on Inf. Theory, IT-44, pp.1424-1439 (1998).
|
 |
21
|
Kenji Yamanishi , Jun-Ichi Takeuchi , Graham Williams , Peter Milne, On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.320-324, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347160]
|
|