ACM Home Page
Please provide us with feedback. Feedback
Outlier detection for high dimensional data
Full text PdfPdf (197 KB)
Source International Conference on Management of Data archive
Proceedings of the 2001 ACM SIGMOD international conference on Management of data table of contents
Santa Barbara, California, United States
Pages: 37 - 46  
Year of Publication: 2001
ISBN:1-58113-332-4
Also published in ...
Authors
Charu C. Aggarwal  IBM T. J. Watson Research Center, Yorktown Heights, NY
Philip S. Yu  IBM T. J. Watson Research Center, Yorktown Heights, NY
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 47,   Downloads (12 Months): 305,   Citation Count: 58
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/375663.375668
What is a DOI?

ABSTRACT

The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
C. C. Aggarwal, J. B. Orlin, R. P. Tai. Optimized Crossover for the Independent Set Problem. Operations Research 45(2), March 1997.
5
6
 
7
A. Arning, R. Agrawal, P. Raghavan. A Linear Method for Deviation Detection in Large Databases. KDD Conference Proceedings, 1995.
 
8
V. Barnett, T. Lewis. Outliers in Statistical Data. John Wiley and Sons, NY 1994.
 
9
10
 
11
 
12
C. Darwin. The Origin of the Species by Natural Selection. Published, 1859.
 
13
D. Hawkins. Identification of Outliers, Chapman and Hall, London, 1980.
 
14
 
15
M. Ester, H.-P. Kriegel, J. Sander, X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD Conference Proceedings, 1996.
 
16
J. J. Grefenstette. Genesis Software Version 5.0. Available at http://www.santafe.edu.
 
17
18
 
19
 
20
 
21
S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi. Optimization by Simulated Annealing. Science (220) (4589): pages 671-680, 1983.
 
22
 
23
 
24
25
 
26
27

CITED BY  58

Collaborative Colleagues:
Charu C. Aggarwal: colleagues
Philip S. Yu: colleagues