ACM Home Page
Please provide us with feedback. Feedback
Better streaming algorithms for clustering problems
Full text PdfPdf (237 KB)
Source Annual ACM Symposium on Theory of Computing archive
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing table of contents
San Diego, CA, USA
SESSION: Session 1B table of contents
Pages: 30 - 39  
Year of Publication: 2003
ISBN:1-58113-674-9
Authors
Moses Charikar  Princeton University
Liadan O'Callaghan  Stanford University
Rina Panigrahy  Cisco Systems
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 88,   Citation Count: 27
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/780542.780548
What is a DOI?

ABSTRACT

We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storage space. Our main result is a randomized algorithm for the k--Median problem which produces a constant factor approximation in one pass using storage space O(k poly log n). This is a significant improvement of the previous best algorithm which yielded a 2O(1/ε) approximation using O(nε) space. Next we give a streaming algorithm for the k--Median problem with an arbitrary distance function. We also study algorithms for clustering problems with outliers in the streaming model. Here, we give bicriterion guarantees, producing constant factor approximations by increasing the allowed fraction of outliers slightly.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
5
 
6
7
 
8
9
 
10
11
 
12
M. Dyer and A. M. Frieze. A simple heuristic for the p--center problem. Operations Research Letters, v. 3, 1985.
 
13
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. Proc. FOCS, 2000.
14
15
 
16
 
17
O. Kariv and S. L. Hakimi. An algorithmic approach to network location problems, part ii: p-medians. SIAM Journal of Appl. Math, v. 37, 1979.
 
18
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In DIMACS series in Discrete Mathematics and Theoretical Computer Science, v. 50, 1999.
 
19
D. S. Hochbaum and D. B. Shmoys. A best possible approximation algorithm for the k--Center problem. Mathematics of Operations Research, v. 10, 1985.
20
 
21
 
22
23
 
24
 
25
26
27
 
28
 
29
R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Proc. UAI, 2002.
 
30
 
31
A. Meyerson, L. O'Callaghan and S. Plotkin. Approximating k--Median: A sampling-based approach. Unpublished manuscript, 2001.
 
32
 
33
 
34

CITED BY  27

Collaborative Colleagues:
Moses Charikar: colleagues
Liadan O'Callaghan: colleagues
Rina Panigrahy: colleagues