ACM Home Page
Please provide us with feedback. Feedback
Assessment and pruning of hierarchical model based clustering
Full text PdfPdf (336 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
SESSION: Research track table of contents
Pages: 197 - 205  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Jeremy Tantrum  University of Washington, Seattle, WA
Alejandro Murua  University of Washington, Seattle, WA
Werner Stuetzle  University of Washington, Seattle, WA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 37,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956775
What is a DOI?

ABSTRACT

The goal of clustering is to identify distinct groups in a dataset. The basic idea of model-based clustering is to approximate the data density by a mixture model, typically a mixture of Gaussians, and to estimate the parameters of the component densities, the mixing fractions, and the number of components from the data. The number of distinct groups in the data is then taken to be the number of mixture components, and the observations are partitioned into clusters (estimates of the groups) using Bayes' rule. If the groups are well separated and look Gaussian, then the resulting clusters will indeed tend to be "distinct" in the most common sense of the word - contiguous, densely populated areas of feature space, separated by contiguous, relatively empty regions. If the groups are not Gaussian, however, this correspondence may break down; an isolated group with a non-elliptical distribution, for example, may be modeled by not one, but several mixture components, and the corresponding clusters will no longer be well separated. We present methods for assessing the degree of separation between the components of a mixture model and between the corresponding clusters. We also propose a new clustering method that can be regarded as a hybrid between model-based and nonparametric clustering. The hybrid clustering algorithm prunes the cluster tree generated by hierarchical model-based clustering. Starting with the tree corresponding to the mixture model chosen by the Bayesian Information Criterion, it progressively merges clusters that do not appear to correspond to different modes of the data density.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. Bickel and J. Fan. Some problems of the estimation of unimodal densities. Statistica Sinica, 6:23--45, 1996.
 
2
P. Bradley, U. Fayyad, and C. Reina. Scaling EM (expectation-maximization) clustering to large databases. Technical Report MSR-TR-98-35, Microsoft Research, 1999.
 
3
J. W Carmichael, G. A. George, and R. S. Julius. Finding natural clusters. Systematic Zoology, 17:144--150, 1968.
 
4
E. B. Fowlkes and C. L. Mallows. A method for comparing two hierarchical clusterings. J. American Statistical Association, 78:553--569, 1983.
 
5
C. Fraley and A. Raftery. How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal, 41(8):578--588, 1998.
 
6
R. Gnanadesikan, J. R. Kettenring, and J. M. Landwehr. Projection plots for displaying clusters. In Statistics and Probability: Essays in Honor of C. R. Rao, pages 269--280. Elsevier/N. Holland, 1982.
 
7
J. A. Hartigan and P. M. Hartigan. The dip test of unimodality. Annals of Statistics, 13:70--84, 1985.
 
8
K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, London, 1979.
 
9
G. J. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, 2000.
 
10
 
11
G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:497--511, 1978.
 
12
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986.
13


Collaborative Colleagues:
Jeremy Tantrum: colleagues
Alejandro Murua: colleagues
Werner Stuetzle: colleagues