ACM Home Page
Please provide us with feedback. Feedback
Robust information-theoretic clustering
Full text PdfPdf (2.22 MB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Philadelphia, PA, USA
SESSION: Research track papers table of contents
Pages: 65 - 75  
Year of Publication: 2006
ISBN:1-59593-339-5
Authors
Christian Böhm  University of Munich, Munich, Germany
Christos Faloutsos  CMU, Pittsburgh, PA
Jia-Yu Pan  CMU, Pittsburgh, PA
Claudia Plant  UMIT, Hall, Austria
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 138,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1150402.1150414
What is a DOI?

ABSTRACT

How do we find a natural clustering of a real world point set, which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? Most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input parameters, and they are sensitive to noise. In this paper, we propose a robust framework for determining a natural clustering of a given data set, based on the minimum description length (MDL) principle. The proposed framework, Robust Information-theoretic Clustering (RIC), is orthogonal to any known clustering algorithm: given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clusterings such that it simultaneously determines the most natural amount and shape (subspace) of the clusters. Our RIC method can be combined with any clustering technique ranging from K-means and K-medoids to advanced methods such as spectral clustering. In fact, RIC is even able to purify and improve an initial coarse clustering, even if we start with very simple methods such as grid-based space partitioning. Moreover, RIC scales well with the data set size. Extensive experiments on synthetic and real world data sets validate the proposed RIC framework.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
6
 
7
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD Conference, 1996.
 
8
P. Grünwald. A tutorial introduction to the minimum description length principle. Advances in Minimum Description Length: Theory and Applications, 2005.
9
 
10
G. Hamerly and C. Elkan. Learning the k in k-means. In Proceedings of NIPS, 2003.
 
11
 
12
I. Jolliffe. Principal Component Analysis. Springer Verlag, 1986.
 
13
F. Murtagh. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354--359, 1983.
 
14
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, 2001.
 
15
 
16
17
 
18
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. CoRR, physics/0004057, 2000.
19
 
20
 
21
22


Collaborative Colleagues:
Christian Böhm: colleagues
Christos Faloutsos: colleagues
Jia-Yu Pan: colleagues
Claudia Plant: colleagues