ACM Home Page
Please provide us with feedback. Feedback
Outlier-robust clustering using independent components
Full text PdfPdf (567 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 5: Clustering in High Dimensions table of contents
Pages 185-198  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Christian Böhm  University of Munich, Munich, Germany
Christos Faloutsos  Carnegie Mellon University, Pittsburgh, PA, USA
Claudia Plant  Technical University of Munich, Munich, Germany
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 30,   Downloads (12 Months): 351,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376638
What is a DOI?

ABSTRACT

How can we efficiently find a clustering, i.e. a concise description of the cluster structure, of a given data set which contains an unknown number of clusters of different shape and distribution and is contaminated by noise? Most existing clustering methods are restricted to the Gaussian cluster model and are very sensitive to noise. If the cluster content follows a non-Gaussian distribution and/or the data set contains a few outliers belonging to no cluster, then the computed data distribution does not match well the true data distribution, or an unnaturally high number of clusters is required to represent the true data distribution of the data set. In this paper we propose OCI (Outlier-robust Clustering using Independent Components), a clustering method which overcomes these problems by (1) applying the exponential power distribution (EPD) as cluster model which is a generalization of Gaussian, uniform, Laplacian and many other distribution functions, (2) applying the Independent Component Analysis (ICA) for both determining the main directions inside a cluster as well as finding split planes in a top-down clustering approach, and (3) defining an efficient and effective filter for outliers, based on EPD and ICA. Our method is parameter-free and as a top-down clustering approach very efficient. An extensive experimental evaluation shows both the accuracy of the obtained clustering result as well as the efficiency of our method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
C. L. Blake and C. J. Merz. ?UCI Repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html?.
5
6
7
 
8
A. P. Dempster, N. M. Laird, and D. B. Rubin. "Maximum Likelihood from Incomplete Data via the EM Algorithm". In J Roy Stat Soc, number 39, pages 1--31, 1977.
9
 
10
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD Conference, 1996.
 
11
P. Grünwald. A tutorial introduction to the minimum description length principle. Advances in Minimum Description Length: Theory and Applications, 2005.
 
12
G. Hamerly and C. Elkan. Learning the k in k-means. In NIPS Conference, 2003.
 
13
A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. 2001.
 
14
A. Mineo and M. Ruggieri. A software tool for the exponential power distribution: The normalp package. Journal of Statistical Software, 12(4), 1 2005.
 
15
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm, 2001.
 
16
17
18


Collaborative Colleagues:
Christian Böhm: colleagues
Christos Faloutsos: colleagues
Claudia Plant: colleagues