|
ABSTRACT
We argue that there are many clustering algorithms, because the notion of "cluster" cannot be precisely defined. Clustering is in the eye of the beholder, and as such, researchers have proposed many induction principles and models whose corresponding optimization problem can only be approximately solved by an even larger number of algorithms. Therefore, comparing clustering algorithms, must take into account a careful understanding of the inductive principles involved.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
M. Aldenderfer and R. Blashfield. Cluster Analysis. Sage Publications, Beverly Hills, USA, 1984.
|
| |
4
|
|
| |
5
|
|
| |
6
|
J. Bezdek and N. Pal. Some new indexes of cluster validity. IEEE Transactions on System, Man and Cybernetics, Part B, 28:301-315, 1998.
|
| |
7
|
R. Bonner. On some clustering techniques. IBM Journal of Research and Development, 8:22-32, 1964.
|
| |
8
|
P. Brucker. On the complexity of clustering problems. In R. Henn, B. Korte, and W. Oetti, editors, Optimization and Operations Research: Proceedings of the workshop held at the University of Bonn, pages 45-54, Berlin, 1978. Springer Verlag Lecture Notes in Economics and Mathematical Systems 157.
|
| |
9
|
A. Dempster, N. Laird, and D. Rubin. Maximum likehood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-38, 1977.
|
| |
10
|
B. Dom. An information-theoretic external cluster-validity measure. IBM Research Report RJ 10219, IBM's Almaden Research Center, San Jose, CA, October 5th 2001.
|
| |
11
|
|
| |
12
|
R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, NY, USA, 1973.
|
| |
13
|
J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95-104, 1974.
|
| |
14
|
U. Elsner. Graph partitioning: A survey. Technical Report 97-27, Technische Universit" at Chemnitz, December 1997.
|
| |
15
|
M. Ester, H. Kriegel, S. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), pages 226-231, Menlo Park, CA, 1996. AAAI, AAAI Press.
|
| |
16
|
V. Estivill-Castro. Hybrid genetic algorithms are better for spatial clustering. In R. Mizoguchi and J. Slaney, editors, Proceedings Sixth Pacific Rim International Conference on Artificial Intelligence PRICAI 2000, pages 424-434, Melbourne, Australia, 2000. Springer-Verlag Lecture Notes in Artificial Intelligence 1886.
|
| |
17
|
|
| |
18
|
V. Estivill-Castro and M. Houle. Fast minimization of total within-group distance. In J. Fong and M. Ng, editors, Proceedings of the International Workshop on Mining Spatial and Temporal Data in conjunction with the fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD-2001, pages 72-81, Hong Kong, April 15-18 2001. City University of Hong Kong.
|
| |
19
|
V. Estivill-Castro and M. Houle. Robust distance-based clustering with applications to spatial data mining. Algorithmica, 30(2):216-242, June 2001.
|
| |
20
|
|
| |
21
|
V. Estivill-Castro and J. Yang. A fast and robust general purpose clustering algorithm. In R. Mizoguchi and J. Slaney, editors, Proceedings Sixth Pacific Rim International Conference on Artificial Intelligence PRICAI 2000, pages 208-218, Melbourne, Australia, 2000. Springer-Verlag Lecture Notes in Artificial Intelligence 1886.
|
| |
22
|
|
| |
23
|
B. Everitt. Cluster Analysis. Halsted Press, New York, USA, 2nd. edition, 1980.
|
 |
24
|
|
| |
25
|
|
 |
26
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
| |
27
|
M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. KDnuggets:News, page Numver 19 item 16, September 2001. www.db-net.aueb.gr/mhalk/papers/validity_survey.pdf.
|
| |
28
|
|
| |
29
|
I. Hall, L. O. Özyurt and J. Bezdek. Clustering with a genetically optimized approach. IEEE Transactions on Evolutionary Computation, 3(2):103-112, July 1999.
|
| |
30
|
|
| |
31
|
A. Hinneburg and D. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proc. 4rd Int. Conf. on Knowledge Discovery and Data Mining, pages 58-65, New York, August 1998. AAAI Press.
|
| |
32
|
|
 |
33
|
|
| |
34
|
J. Kalbfleisch. Probability and Statistical Inference --- Volume 2: Statistical Inference. Springer-Verlag, NY, US., second edition, 1985.
|
| |
35
|
|
| |
36
|
L. Kaufman and P. Rousseuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, NY, USA, 1990.
|
| |
37
|
W. Kloesgen and J. Zytkow. Machine discovery terminology. KDnuggets Publicatiosn and References http://www.kdnuggets.com/publications/index.html. http://orgwis.gmd.de/projects/explora/terms.html.
|
| |
38
|
H. Kuhn. A note on Fermat's problem. Mathematical Programming, 4(1):98-107, 1973.
|
| |
39
|
H. Kuhn and E. Kuenne. An efficient algorithm for the numerical solution of the generalized Weber problem in spatial economics. Journal of Regional Science, 4(2):21-33,1962.
|
| |
40
|
J. MacQueen. Some methods for classification and analysis of multivariate observations. In L. Le Cam and J. Neyman, editors, 5th Berkley Symposium on Mathematical Statistics and Probability, pages 281-297, 1967. Volume 1.
|
| |
41
|
|
| |
42
|
N. Pal and J. Bezdel. On cluster validity for the fuzzy c-means model. IEEE Transactions on Fuzzy Systems, 3(3):370-379, August 1995.
|
| |
43
|
S. Ray and R. Turi. Determination of number of clusters in k-means clustering and application in colour image segmentation. In N. Pal, D, A. K., and J. Das, editors, Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT'99), pages 137-143, New Delhi, India, December 27-29 1999. Narosa Publishing House.
|
| |
44
|
|
| |
45
|
J. Rissanen. Stochastic complexity. Journal of the Royal Statistical Society, Series B, 49(3):223-239, 1987.
|
| |
46
|
|
| |
47
|
M. Tanner. Tools for Statistical Inference. Springer-Verlag, NY, US., 1993.
|
| |
48
|
M. Teitz and P. Bart. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16:955-961, 1968.
|
| |
49
|
S. Theodoridis and K. Koutroumbas. Pattern Recognition. Academic Press, NY, USA, 1999.
|
| |
50
|
D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley & sons, UK, 1985.
|
| |
51
|
C. Wallace and D. Boulton. An information measure for classification. Computer Journal, 11:185-195, 1968.
|
| |
52
|
R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing. Academic Press, San Diego, CA, 1997.
|
| |
53
|
M. Windham. Cluster validity for the fuzzy c-means clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(4):357-363, 1982.
|
| |
54
|
|
 |
55
|
|
 |
56
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|