ACM Home Page
Please provide us with feedback. Feedback
CoCo: coding cost for parameter-free outlier detection
Full text MovMov (4:59),  PdfPdf (6.52 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 149-158  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Christian Böhm  University of Munich, Munich, Germany
Katrin Haegler  University of Munich, Munich, Germany
Nikola S. Müller  Max Planck Institute of Biochemistry, Martinsried, Germany
Claudia Plant  Technische Universität München, Munich, Germany
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 66,   Downloads (12 Months): 142,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557042
What is a DOI?

ABSTRACT

How can we automatically spot all outstanding observations in a data set? This question arises in a large variety of applications, e.g. in economy, biology and medicine. Existing approaches to outlier detection suffer from one or more of the following drawbacks: The results of many methods strongly depend on suitable parameter settings being very difficult to estimate without background knowledge on the data, e.g. the minimum cluster size or the number of desired outliers. Many methods implicitly assume Gaussian or uniformly distributed data, and/or their result is difficult to interpret. To cope with these problems, we propose CoCo, a technique for parameter-free outlier detection. The basic idea of our technique relates outlier detection to data compression: Outliers are objects which can not be effectively compressed given the data set. To avoid the assumption of a certain data distribution, CoCo relies on a very general data model combining the Exponential Power Distribution with Independent Components. We define an intuitive outlier factor based on the principle of the Minimum Description Length together with an novel algorithm for outlier detection. An extensive experimental evaluation on synthetic and real world data demonstrates the benefits of our technique. Availability: The source code of CoCo and the data sets used in the experiments are available at: http://www.dbs.ifi.lmu.de/Forschung/KDD/Boehm/CoCo.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
D. Hawkins. Identification of Outliers. Chapman and Hall, London, 1980.
 
6
A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. 2001.
7
 
8
 
9
 
10
E. M. Knorr and R. T. Ng. A unified notion of outliers: Properties and computation. In KDD, pages 219--222, 1997.
 
11
 
12
 
13
A. Mineo and M. Ruggieri. A software tool for the exponential power distribution: The normalp package. Journal of Statistical Software, 12(4), 1 2005.
 
14
S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, pages 315--, 2003.
 
15
 
16
J. Rissanen. Mdl denoising. IEEE Transactions on Information Theory, 46(7):2537--2543, 2000.
 
17
M. Robnik-Sikonja and I. Kononenko. Pruning regression trees with mdl. In ECAI, pages 455--459, 1998.
 
18
J. Xie, D. Zhang, and W. Xu. Spatially adaptive wavelet denoising using the minimum description length principle. IEEE Transactions on Image Processing, 13(2):179--187, 2004.
 
19

Collaborative Colleagues:
Christian Böhm: colleagues
Katrin Haegler: colleagues
Nikola S. Müller: colleagues
Claudia Plant: colleagues