ACM Home Page
Please provide us with feedback. Feedback
Polynomial-time approximation schemes for geometric min-sum median clustering
Full text PdfPdf (258 KB)
Source Journal of the ACM (JACM) archive
Volume 49 ,  Issue 2  (March 2002) table of contents
Pages: 139 - 156  
Year of Publication: 2002
ISSN:0004-5411
Authors
Rafail Ostrovsky  Telcordia Technologies, Morristown, New Jersey
Yuval Rabani  Technion---IIT, Haifa, Israel
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 69,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/506147.506149
What is a DOI?

ABSTRACT

The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings.More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝd) endowed with a distance function (e.g., L2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0,1}d with Hamming distance, and ℝd either with L1 distance, or with L2 distance, or with the square of L2 distance. In all these settings, the best previous results were constant factor approximation guarantees.We note that our problem is similar in flavor to the k-median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
Alon, N., and Spencer, J. 1992. The Probabilistic Method. Wiley. New York.
 
4
5
6
7
 
8
 
9
 
10
 
11
 
12
13
14
15
 
16
 
17
 
18
Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Soc. Inf. Sci. 41, 6, 391--407.
 
19
 
20
Dumais, S. T. 1991. Improving the retrieval of information from external sources. Behav. Res. Meth. Instrum. Comput. 23, 2, 229--236.
21
 
22
 
23
 
24
 
25
 
26
27
 
28
 
29
Johnson, W. B., and Lindenstrauss, J. 1984. Extensions of Lipschitz mappings into Hilbert space. Contemp. Math. 26, 189--206.
 
30
 
31
Kannan, R., and Vinay, V.2001. The Manjara Meta-Search Engine. http://cluster.cs. yale.edu/about.html.
 
32
Karp, R. M. 2000. The genomics revolution and its challenges for algorithmic research. Bull. EATCS 71 (June), 151--159.
33
34
35
 
36
 
37
Linial, N., London, E., and Rabinovich, Y. 1995. The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 2, 215--245.
 
38
 
39
40
 
41
Zamir, O., Etzioni, O., Madani, O., and Karp, R. M. 1997. Fast and intuitive clustering of web documents. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. (Newport Beach, Calif., Aug.), D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, Eds. AAAI Press, pp. 287--290.


Collaborative Colleagues:
Rafail Ostrovsky: colleagues
Yuval Rabani: colleagues