ACM Home Page
Please provide us with feedback. Feedback
Approximate clustering without the approximation
Full text PdfPdf (403 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 1068-1077  
Year of Publication: 2009
Authors
Maria-Florina Balcan  Carnegie Mellon University
Avrim Blum  Carnegie Mellon University
Anupam Gupta  Carnegie Mellon University
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 80,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering.

This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also yield more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct "target" clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close pointwise to the truth.

In this paper, we show that if we make this implicit assumption explicit---that is, if we assume that any c-approximation to the given clustering objective φ is ε-close to the target---then we can produce clusterings that are O(ε)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that we can achieve this guarantee for any constant c > 1, and for the min-sum objective we can do this for any constant c > 2.

Our results also highlight a surprising conceptual difference between assuming that the optimal solution to, say, the k-median objective is ε-close to the target, and assuming that any approximately optimal solution is ε-close to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(ε)-close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
{AK05} S. Arora and R. Kannan. Learning mixtures of arbitrary gaussians. In Proc. 37th STOC, 2005.
 
3
{AM05} D. Achlioptas and F. McSherry. On spectral learning of mixtures of distributions. In COLT, 2005.
4
 
5
6
7
 
8
 
9
10
 
11
{CS04} A. Czumaj and C. Sohler. Sublinear-time approximation for clustering via random samples. In Proc. 31st ICALP, pages 396--407, 2004.
 
12
 
13
{DGL96} L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.
 
14
{DHS01} R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, 2001.
15
 
16
17
18
19
 
20
 
21
{KSV05} R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. In Proc. 18th COLT, 2005.
 
22
{Llo82} S. P. Lloyd. Least squares quantization in PCM. IEEE Trans. Inform. Theory, 28(2):129--137, 1982.
23
 
24
 
25
26
 
27

Collaborative Colleagues:
Maria-Florina Balcan: colleagues
Avrim Blum: colleagues
Anupam Gupta: colleagues