| Mining top-n local outliers in large databases |
| Full text |
Pdf
(485 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
San Francisco, California
Pages: 293 - 298
Year of Publication: 2001
ISBN:1-58113-391-X
|
|
Authors
|
|
Wen Jin
|
Simon Fraser University, Burnaby, B.C., Canada
|
|
Anthony K. H. Tung
|
Simon Fraser University, Burnaby, B.C., Canada
|
|
Jiawei Han
|
Simon Fraser University, Burnaby, B.C., Canada
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 117, Citation Count: 28
|
|
|
ABSTRACT
Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. A recent work on outlier detection has introduced a novel notion of local outlier in which the degree to which an object is outlying is dependent on the density of its local neighborhood, and each object can be assigned a Local Outlier Factor (LOF) which represents the likelihood of that object being an outlier. Although the concept of local outliers is a useful one, the computation of LOF values for every data objects requires a large number of &kgr;-nearest neighbors searches and can be computationally expensive. Since most objects are usually not outliers, it is useful to provide users with the option of finding only n most outstanding local outliers, i.e., the top-n data objects which are most likely to be local outliers according to their LOFs. However, if the pruning is not done carefully, finding top-n outliers could result in the same amount of computation as finding LOF for all objects. In this paper, we propose a novel method to efficiently find the top-n local outliers in large databases. The concept of "micro-cluster" is introduced to compress the data. An efficient micro-cluster-based local outlier mining algorithm is designed based on this concept. As our algorithm can be adversely affected by the overlapping in the micro-clusters, we proposed a meaningful cut-plane solution for overlapping data. The formal analysis and experiments show that this method can achieve good performance in finding the most outstanding local outliers.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, 1994.
|
| |
2
|
|
 |
3
|
Markus M. Breunig , Hans-Peter Kriegel , Raymond T. Ng , Jörg Sander, LOF: identifying density-based local outliers, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.93-104, May 15-18, 2000, Dallas, Texas, United States
|
| |
4
|
M. Ester', H.-P. Kriegel, J. Sander, and X. Xu. A density-bmsed algorithm for' discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226-231, Portland, Oregon, Aug. 1996.
|
 |
5
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
| |
6
|
D. Hawkins. Identification of Outliers. Chapman and Hall, London, 1980.
|
| |
7
|
|
| |
8
|
|
 |
9
|
Sridhar Ramaswamy , Rajeev Rastogi , Kyuseok Shim, Efficient algorithms for mining outliers from large data sets, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.427-438, May 15-18, 2000, Dallas, Texas, United States
|
 |
10
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 28
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ji Zhang , Meng Lou , Tok Wang Ling , Hai Wang, Hos-Miner: a system for detecting outlyting subspaces of high-dimensional data, Proceedings of the Thirtieth international conference on Very large data bases, p.1265-1268, August 31-September 03, 2004, Toronto, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|