ACM Home Page
Please provide us with feedback. Feedback
Mining top-n local outliers in large databases
Full text PdfPdf (485 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 293 - 298  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
Wen Jin  Simon Fraser University, Burnaby, B.C., Canada
Anthony K. H. Tung  Simon Fraser University, Burnaby, B.C., Canada
Jiawei Han  Simon Fraser University, Burnaby, B.C., Canada
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 117,   Citation Count: 28
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502554
What is a DOI?

ABSTRACT

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. A recent work on outlier detection has introduced a novel notion of local outlier in which the degree to which an object is outlying is dependent on the density of its local neighborhood, and each object can be assigned a Local Outlier Factor (LOF) which represents the likelihood of that object being an outlier. Although the concept of local outliers is a useful one, the computation of LOF values for every data objects requires a large number of &kgr;-nearest neighbors searches and can be computationally expensive. Since most objects are usually not outliers, it is useful to provide users with the option of finding only n most outstanding local outliers, i.e., the top-n data objects which are most likely to be local outliers according to their LOFs. However, if the pruning is not done carefully, finding top-n outliers could result in the same amount of computation as finding LOF for all objects. In this paper, we propose a novel method to efficiently find the top-n local outliers in large databases. The concept of "micro-cluster" is introduced to compress the data. An efficient micro-cluster-based local outlier mining algorithm is designed based on this concept. As our algorithm can be adversely affected by the overlapping in the micro-clusters, we proposed a meaningful cut-plane solution for overlapping data. The formal analysis and experiments show that this method can achieve good performance in finding the most outstanding local outliers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley & Sons, 1994.
 
2
3
 
4
M. Ester', H.-P. Kriegel, J. Sander, and X. Xu. A density-bmsed algorithm for' discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226-231, Portland, Oregon, Aug. 1996.
5
 
6
D. Hawkins. Identification of Outliers. Chapman and Hall, London, 1980.
 
7
 
8
9
10

CITED BY  28

Collaborative Colleagues:
Wen Jin: colleagues
Anthony K. H. Tung: colleagues
Jiawei Han: colleagues