| Entropy-based criterion in categorical clustering |
| Full text |
Pdf
(140 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 69
archive
Proceedings of the twenty-first international conference on Machine learning
table of contents
Banff, Alberta, Canada
Page: 68
Year of Publication: 2004
ISBN:1-58113-828-5
|
|
Authors
|
|
Tao Li
|
University of Rochester, Rochester, NY
|
|
Sheng Ma
|
IBM T. J. Watson Research Center, Hawthorne, NY
|
|
Mitsunori Ogihara
|
University of Rochester, Rochester, NY
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 68, Citation Count: 9
|
|
|
ABSTRACT
Entropy-type measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropy-based criterion in clustering categorical data. It first shows that the entropy-based criterion can be derived in the formal framework of probabilistic clustering models and establishes the connection between the criterion and the approach based on dissimilarity co-efficients. An iterative Monte-Carlo procedure is then presented to search for the partitions minimizing the criterion. Experiments are conducted to show the effectiveness of the proposed procedure.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Baulieu, F. B. (1997). Two variant axiom systems for presence/absence based dissimilarity coefficients. Journal of Classification, 14, 159--170.
|
| |
3
|
Baxter, R. A., & Oliver, J. J. (1994). MDL and MML: similarities and differences (Technical Report 207). Monash University.
|
| |
4
|
Bock, H.-H. (1989). Probabilistic aspects in cluster analysis. In O. Optiz (Ed.), Conceptual and numerical analysis of data, 12--44. Berlin: Springer-verlag.
|
| |
5
|
Celeux, G., & Govaert, G. (1991). Clustering criteria for discrete data and latent class models. Journal of Classification, 8, 157--176.
|
| |
6
|
|
 |
7
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, CACTUS—clustering categorical data using summaries, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.73-83, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312201]
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
Havrda, J., & Charvat, F. (1967). Quantification method of classification processes: Concept of structural a-entropy. Kybernetika, 3, 30--35.
|
| |
12
|
|
| |
13
|
Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. John Wiley & Sons.
|
 |
14
|
|
| |
15
|
McCallum, A. K. (1996). Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/mccallum/bow.
|
| |
16
|
Roberts, S., Everson, R., & Rezek, I. (2000). Maximum certainty data partitioning. Pattern Recognition, 33, 833--839.
|
| |
17
|
|
| |
18
|
Symons, M. J. (1981). Clustering criteria and multivariate normal mixtures. Biometrics, 37, 35--43.
|
| |
19
|
Wallace, R. S. (1989). Finding natural clusters through entropy minimization (Technical Report CMU-CS-89-183). Carnegie Mellon University.
|
| |
20
|
Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis (Technical Report). Department of Computer Science, University of Minnesota.
|
CITED BY 9
|
|
|
|
|
|
|
|
|
|
|
Bingjun Sun , Prasenjit Mitra , C. Lee Giles , John Yen , Hongyuan Zha, Topic segmentation with shared topic detection and alignment of multiple documents, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|