| Adapting the right measures for K-means clustering |
| Full text |
Mov
(24:28),
Pdf
(696 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Paris, France
SESSION: Research track papers
table of contents
Pages 877-886
Year of Publication: 2009
ISBN:978-1-60558-495-9
|
|
Authors
|
|
Junjie Wu
|
Beihang University, Beijing, China
|
|
Hui Xiong
|
Rutgers University, Newark, NJ, USA
|
|
Jian Chen
|
Tsinghua University, Beijing, China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 68, Downloads (12 Months): 236, Citation Count: 0
|
|
|
ABSTRACT
Clustering validation is a long standing challenge in the clustering literature. While many validation measures have been developed for evaluating the performance of clustering algorithms, these measures often provide inconsistent information about the clustering performance and the best suitable measures to use in practice remain unknown. This paper thus fills this crucial void by giving an organized study of 16 external validation measures for K-means clustering. Specifically, we first introduce the importance of measure normalization in the evaluation of the clustering performance on data with imbalanced class distributions. We also provide normalization solutions for several measures. In addition, we summarize the major properties of these external measures. These properties can serve as the guidance for the selection of validation measures in different application scenarios. Finally, we reveal the interrelationships among these external measures. By mathematical transformation, we show that some validation measures are equivalent. Also, some measures have consistent validation performances. Most importantly, we provide a guide line to select the most suitable validation measures for K-means clustering.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Ben-Hur and I. Guyon. Detecting stable clusters using principal component analysis. In Methods in Molecular Biology. Humana press, 2003.
|
| |
2
|
Marcel Brun , Chao Sima , Jianping Hua , James Lowey , Brent Carroll , Edward Suh , Edward R. Dougherty, Model-based evaluation of clustering validation measures, Pattern Recognition, v.40 n.3, p.807-824, March, 2007
[doi> 10.1016/j.patcog.2006.06.026]
|
| |
3
|
|
| |
4
|
M. DeGroot and M. Schervish. Probability and Statistics (3rd Edition). Addison Wesley, 2001.
|
| |
5
|
E.B. Fowlkes and C.L. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78:553--569, 1983.
|
| |
6
|
L.A. Goodman and W.H. Kruskal. Measures of association for cross classification. Journal of the American Statistical Association, 49:732--764, 1954.
|
 |
7
|
|
| |
8
|
L. Hubert. Nominal scale response agreement as a generalized correlation. British Journal of Mathematical and Statistical Psychology, 30:98--103, 1977.
|
| |
9
|
L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2:193--218, 1985.
|
| |
10
|
|
| |
11
|
G. Karypis. Cluto -- software for clustering high-dimensional datasets, version 2.1.1. Oct. 2007.
|
| |
12
|
M.G. Kendall. Rank Correlation Methods. New York: Hafner Publishing Co., 1955.
|
 |
13
|
|
| |
14
|
J. MacQueen. Some methods for classification and analysis of multivariate observations. In BSMSP, Vol. I, Statistics. University of California Press, 1967.
|
| |
15
|
MathWorks. K-means clustering in statistics toolbox.
|
 |
16
|
|
| |
17
|
B. Mirkin. Mathematical Classification and Clustering. Kluwer Academic Press, 1996.
|
| |
18
|
W.M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66:846--850, 1971.
|
| |
19
|
|
| |
20
|
Michael Steinbach, George Karypis, and Vipin Kumar. A comparison of document clustering techniques. In Workshop on Text Mining, KDD, 2000.
|
| |
21
|
A. Strehl, J. Ghosh, and R.J. Mooney. Impact of similarity measures on web-page clustering. In Workshop on Artificial Intelligence for Web Search, AAAI, pages 58--64, 2000.
|
| |
22
|
TREC. Text retrieval conference. Oct. 2007.
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
|
|