ACM Home Page
Please provide us with feedback. Feedback
Adapting the right measures for K-means clustering
Full text MovMov (24:28),  PdfPdf (696 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 877-886  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Junjie Wu  Beihang University, Beijing, China
Hui Xiong  Rutgers University, Newark, NJ, USA
Jian Chen  Tsinghua University, Beijing, China
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 68,   Downloads (12 Months): 236,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557115
What is a DOI?

ABSTRACT

Clustering validation is a long standing challenge in the clustering literature. While many validation measures have been developed for evaluating the performance of clustering algorithms, these measures often provide inconsistent information about the clustering performance and the best suitable measures to use in practice remain unknown. This paper thus fills this crucial void by giving an organized study of 16 external validation measures for K-means clustering. Specifically, we first introduce the importance of measure normalization in the evaluation of the clustering performance on data with imbalanced class distributions. We also provide normalization solutions for several measures. In addition, we summarize the major properties of these external measures. These properties can serve as the guidance for the selection of validation measures in different application scenarios. Finally, we reveal the interrelationships among these external measures. By mathematical transformation, we show that some validation measures are equivalent. Also, some measures have consistent validation performances. Most importantly, we provide a guide line to select the most suitable validation measures for K-means clustering.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Ben-Hur and I. Guyon. Detecting stable clusters using principal component analysis. In Methods in Molecular Biology. Humana press, 2003.
 
2
 
3
 
4
M. DeGroot and M. Schervish. Probability and Statistics (3rd Edition). Addison Wesley, 2001.
 
5
E.B. Fowlkes and C.L. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78:553--569, 1983.
 
6
L.A. Goodman and W.H. Kruskal. Measures of association for cross classification. Journal of the American Statistical Association, 49:732--764, 1954.
7
 
8
L. Hubert. Nominal scale response agreement as a generalized correlation. British Journal of Mathematical and Statistical Psychology, 30:98--103, 1977.
 
9
L. Hubert and P. Arabie. Comparing partitions. Journal of Classification, 2:193--218, 1985.
 
10
 
11
G. Karypis. Cluto -- software for clustering high-dimensional datasets, version 2.1.1. Oct. 2007.
 
12
M.G. Kendall. Rank Correlation Methods. New York: Hafner Publishing Co., 1955.
13
 
14
J. MacQueen. Some methods for classification and analysis of multivariate observations. In BSMSP, Vol. I, Statistics. University of California Press, 1967.
 
15
MathWorks. K-means clustering in statistics toolbox.
16
 
17
B. Mirkin. Mathematical Classification and Clustering. Kluwer Academic Press, 1996.
 
18
W.M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66:846--850, 1971.
 
19
 
20
Michael Steinbach, George Karypis, and Vipin Kumar. A comparison of document clustering techniques. In Workshop on Text Mining, KDD, 2000.
 
21
A. Strehl, J. Ghosh, and R.J. Mooney. Impact of similarity measures on web-page clustering. In Workshop on Artificial Intelligence for Web Search, AAAI, pages 58--64, 2000.
 
22
TREC. Text retrieval conference. Oct. 2007.
 
23
 
24
25
 
26

Collaborative Colleagues:
Junjie Wu: colleagues
Hui Xiong: colleagues
Jian Chen: colleagues