|
ABSTRACT
We continue the investigation of problems concerning correlation clustering or clustering with qualitative information, which is a clustering formulation that has been studied recently [5, 7, 8, 3]. The basic setup here is that we are given as input a complete graph on n nodes (which correspond to nodes to be clustered) whose edges are labeled + (for similar pairs of items) and - (for dissimilar pairs of items). Thus we have only as input qualitative information on similarity and no quantitative distance measure between items. The quality of a clustering is measured in terms of its number of agreements, which is simply the number of edges it correctly classifies, that is the sum of number of - edges whose endpoints it places in different clusters plus the number of + edges both of whose endpoints it places within the same cluster.In this paper, we study the problem of finding clusterings that maximize the number of agreements, and the complementary minimization version where we seek clusterings that minimize the number of disagreements. We focus on the situation when the number of clusters is stipulated to be a small constant k. Our main result is that for every k, there is a polynomial time approximation scheme for both maximizing agreements and minimizing disagreements. (The problems are NP-hard for every k ≥ 2.) The main technical work is for the minimization version, as the PTAS for maximizing agreements follows along the lines of the property tester for Max k-CUT from [13].In contrast, when the number of clusters is not specified, the problem of minimizing disagreements was shown to be APX-hard [7], even though the maximization version admits a PTAS.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Amit Agarwal , Moses Charikar , Konstantin Makarychev , Yury Makarychev, O(√log n) approximation algorithms for min UnCut, min 2CNF deletion, and directed cut problems, Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, May 22-24, 2005, Baltimore, MD, USA
[doi> 10.1145/1060590.1060675]
|
 |
2
|
|
 |
3
|
Noga Alon , Konstantin Makarychev , Yury Makarychev , Assaf Naor, Quadratic forms on graphs, Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, May 22-24, 2005, Baltimore, MD, USA
[doi> 10.1145/1060590.1060664]
|
| |
4
|
|
| |
5
|
|
| |
6
|
A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. J Comp. Biol., 6:281--97, 1999.
|
| |
7
|
|
| |
8
|
|
 |
9
|
W. Fernandez de la Vega , Marek Karpinski , Claire Kenyon , Yuval Rabani, Approximation schemes for clustering problems, Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, June 09-11, 2003, San Diego, CA, USA
[doi> 10.1145/780542.780550]
|
| |
10
|
|
| |
11
|
E. Demaine and N. Immorlica. Correlation clustering with partial information. In Proc. of 6th APPROX, pages 1--13, 2003.
|
| |
12
|
D. Emanuel and A. Fiat. Correlation clustering---minimizing disagreements on arbitrary weighted graphs. In Proc. of 11th ESA, pages 208--20, 2003.
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
A. Nemirovski, C. Roos, and T. Terlaky. On maximization of quadratic form over intersection of ellipsoids with common center. Mathematical Programming, 86(3):463--473, 1999.
|
| |
17
|
|
| |
18
|
|
CITED BY 5
|
|
Gagan Aggarwal , Nir Ailon , Florin Constantin , Eyal Even-Dar , Jon Feldman , Gereon Frahling , Monika R. Henzinger , S. Muthukrishnan , Noam Nisan , Martin Pál , Mark Sandler , Anastasios Sidiropoulos, Theory research at Google, ACM SIGACT News, v.39 n.2, June 2008
|
|
|
|
|
|
|
|
|
|
|
|
Xin Li , Jun Yan , Weiguo Fan , Ning Liu , Shuicheng Yan , Zheng Chen, An online blog reading system by topic clustering and personalized ranking, ACM Transactions on Internet Technology (TOIT), v.9 n.3, p.1-26, July 2009
|
|