|
ABSTRACT
In a clustering problem, the aim is to partition a given set of n points in d-dimensional space into k groups, called clusters, so that points within each cluster are near each other. Two objective functions frequently used to measure the performance of a clustering algorithm are, for any L4 metric, (a) the maximum distance between pairs of points in the same cluster, and (b) the maximum distance between points in each cluster and a chosen cluster center; we refer to either measure as the cluster size.
We show that one cannot approximate the optimal cluster size for a fixed number of clusters within a factor close to 2 in polynomial time, for two or more dimensions, unless P=NP. We also present an algorithm that achieves this factor of 2 in time &Ogr;(n log k), and show that this running time is optimal in the algebraic decision tree model. For a fixed cluster size, on the other hand, we give a polynomial time approximation scheme that estimates the optimal number of clusters under the second measure of cluster size within factors arbitrarily close to 1. Our approach is extended to provide approximation algorithms for the restricted centers, suppliers, and weighted suppliers problems that run in optimal &Ogr;(n log k) time and achieve optimal or nearly optimal approximation bounds.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
Ba83
|
Brenda S. Baker, "Approximation algorithms for NP-complete problems on planar graphs." Proc. 24th IEEE FOCS (1983), pp. 265 273.
|
 |
Be83
|
|
| |
Br77
|
P. Bruckcr, "On the complexity of clustering problems," in Optimization and Operations Research (R. Hcnn, B. Kortc, W. Olctti eds.), Springer-Verlag, Berlin, Heidelberg, New York (1977), pp. 45-54.
|
| |
Eq87
|
William Equitz, "Fast algorithms for vector quantization picture coding," Proc. International Conference on Acoustics, Speech, an(t Signal Processing (1987), I)P- 725 728.
|
| |
FPT81
|
R.J. Fowler, M.S. Paterson and S.L. Tanimoto, "Optimal packing and covering in the plane are NP-complete," Inf. Proc. Letters, Vol. 12, No. 3 (1981), pp. 133 137.
|
| |
GJ77
|
M.R. Garey and D.S. Johnson, "The rectilinear Steiner tree problem is NP- complete," Siam J. Appl. Math. 32 (1977), pp. 826-834.
|
| |
GJ79
|
|
| |
Go85
|
Te6filo F. Gonz~ez, "Clustering to minimize the maximum interclustcr distance," Theoretical Computer Science 38, North- Holland (1985), pp. 293 306.
|
| |
Gr84
|
Robert M. Gray, "Vector quantization," IEEE ASSP Magazim:, April (1984), pp. 4-29.
|
| |
HS85
|
D.S. Hochbaum and D.B. Shmoys, "A best possible heuristic for the k-center problem," Mathematics of Operations Research, Vol. 10, No. 2 (1985), the. 180 184.
|
 |
HS86
|
|
| |
HN79
|
W.L. Hsu and G.L. Nemhauser, "Easy and hard bottleneck location problems," Discr. Appl. Math. 1 (1979), pp. 209 215.
|
| |
MS84
|
N. Megiddo and K.J. Supowit, "On the complexity of some common geometric location problems," Siam J. Comput., Vol. 13, No. 1 (1984), pp. 182 196.
|
| |
Me88
|
Stuart G. Mentzer, "Lower bounds on metric k-center problems," Manuscript, 1988.
|
| |
PS85
|
|
| |
Va86
|
Pravin M. Vaidya, "An optimal algorithm for the all-nearest-neighbors problent," Proc. 27th IEEE FOCS (1986), pp. 117 122.
|
| |
Vs87
|
Stephen Vavasis, personal communication.
|
CITED BY 54
|
|
|
|
|
|
|
|
|
|
|
Moses Charikar , Chandra Chekuri , Tomás Feder , Rajeev Motwani, Incremental clustering and dynamic information retrieval, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.626-635, May 04-06, 1997, El Paso, Texas, United States
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, A practical clustering algorithm for static and dynamic information organization, Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, p.51-60, January 17-19, 1999, Baltimore, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sunil Arya , David M. Mount , Nathan S. Netanyahu , Ruth Silverman , Angela Wu, An optimal algorithm for approximate nearest neighbor searching, Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, p.573-582, January 23-25, 1994, Arlington, Virginia, United States
|
|
|
|
|
|
Marek Chrobak , David Eppstein , Giuseppe F. Italiano , Moti Yung, Efficient sequential and parallel algorithms for computing recovery points in trees and paths, Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms, p.158-167, January 28-30, 1991, San Francisco, California, United States
|
|
|
|
|
|
Mary Inaba , Naoki Katoh , Hiroshi Imai, Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract), Proceedings of the tenth annual symposium on Computational geometry, p.332-339, June 06-08, 1994, Stony Brook, New York, United States
|
|
|
Nina Mishra , Dan Oblinger , Leonard Pitt, Sublinear time approximate clustering, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, p.439-447, January 07-09, 2001, Washington, D.C., United States
|
|
|
|
|
|
Jon Kleinberg , Christos Papadimitriou , Prabhakar Raghavan, Segmentation problems, Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.473-482, May 24-26, 1998, Dallas, Texas, United States
|
|
|
Moses Charikar , Samir Khuller , David M. Mount , Giri Narasimhan, Algorithms for facility location problems with outliers, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, p.642-651, January 07-09, 2001, Washington, D.C., United States
|
|
|
Magnús M. Halldórsson , Kazuo Iwano , Naoki Katoh , Takeshi Tokuyama, Finding subsets maximizing minimum structures, Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, p.150-159, January 22-24, 1995, San Francisco, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Javed Aslam , Katya Pelekhov , Daniela Rus, Static and dynamic information organization with star clusters, Proceedings of the seventh international conference on Information and knowledge management, p.208-217, November 02-07, 1998, Bethesda, Maryland, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rong Ge , Martin Ester , Byron J. Gao , Zengjian Hu , Binay Bhattacharya , Boaz Ben-Moshe, Joint cluster analysis of attribute data and relationship data: The connected k-center problem, algorithms and applications, ACM Transactions on Knowledge Discovery from Data (TKDD), v.2 n.2, p.1-35, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marco Furini , Filippo Geraci , Manuela Montangero , Marco Pellegrini, VISTO: visual storyboard for web video browsing, Proceedings of the 6th ACM international conference on Image and video retrieval, p.635-642, July 09-11, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|