ACM Home Page
Please provide us with feedback. Feedback
Entropy-based criterion in categorical clustering
Full text PdfPdf (140 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 68  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Tao Li  University of Rochester, Rochester, NY
Sheng Ma  IBM T. J. Watson Research Center, Hawthorne, NY
Mitsunori Ogihara  University of Rochester, Rochester, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 68,   Citation Count: 9
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015404
What is a DOI?

ABSTRACT

Entropy-type measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropy-based criterion in clustering categorical data. It first shows that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and establishes the connection between the criterion and the approach based on dissimilarity co-efficients. An iterative Monte-Carlo procedure is then presented to search for the partitions minimizing the criterion. Experiments are conducted to show the effectiveness of the proposed procedure.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Baulieu, F. B. (1997). Two variant axiom systems for presence/absence based dissimilarity coefficients. Journal of Classification, 14, 159--170.
 
3
Baxter, R. A., & Oliver, J. J. (1994). MDL and MML: similarities and differences (Technical Report 207). Monash University.
 
4
Bock, H.-H. (1989). Probabilistic aspects in cluster analysis. In O. Optiz (Ed.), Conceptual and numerical analysis of data, 12--44. Berlin: Springer-verlag.
 
5
Celeux, G., & Govaert, G. (1991). Clustering criteria for discrete data and latent class models. Journal of Classification, 8, 157--176.
 
6
7
 
8
 
9
 
10
 
11
Havrda, J., & Charvat, F. (1967). Quantification method of classification processes: Concept of structural a-entropy. Kybernetika, 3, 30--35.
 
12
 
13
Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. John Wiley & Sons.
14
 
15
McCallum, A. K. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/mccallum/bow.
 
16
Roberts, S., Everson, R., & Rezek, I. (2000). Maximum certainty data partitioning. Pattern Recognition, 33, 833--839.
 
17
 
18
Symons, M. J. (1981). Clustering criteria and multivariate normal mixtures. Biometrics, 37, 35--43.
 
19
Wallace, R. S. (1989). Finding natural clusters through entropy minimization (Technical Report CMU-CS-89-183). Carnegie Mellon University.
 
20
Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis (Technical Report). Department of Computer Science, University of Minnesota.

CITED BY  9
Collaborative Colleagues:
Tao Li: colleagues
Sheng Ma: colleagues
Mitsunori Ogihara: colleagues