ACM Home Page
Please provide us with feedback. Feedback
The space complexity of pass-efficient algorithms for clustering
Full text PdfPdf (248 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm table of contents
Miami, Florida
Pages: 1157 - 1166  
Year of Publication: 2006
ISBN:0-89871-605-5
Authors
Kevin L. Chang  Yale University, New Haven
Ravi Kannan  Yale University, New Haven
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
: SIAM Activity Group on Discrete Mathematics
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 31,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1109557.1109685
What is a DOI?

ABSTRACT

We present multiple pass streaming algorithms for a basic clustering problem for massive data sets. If our algorithm is allotted 2l passes, it will produce an approximation with error at most ε using Õ(k32/l) bits of memory, the most critical resource for streaming computation. We demonstrate that this tradeoff between passes and memory allotted is intrinsic to the problem and model of computation by proving lower bounds on the memory requirements of any l pass randomized algorithm that are nearly matched by our upper bounds. To the best of our knowledge, this is the first time nearly matching bounds have been proved for such an exponential tradeoff for randomized computation.In this problem, we are given a set of n points drawn randomly according to a mixture of k uniform distributions and wish to approximate the density function of the mixture. The points are placed in a datastream (possibly in adversarial order), which may only be read sequentially by the algorithm. We argue that this models, among others, the datastream produced by a national census of the incomes of all citizens.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
6
 
7
 
8
 
9
 
10
11
12
 
13
 
14
15
 
16
 
17
 
18
 
19
J. I. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12:315--323, 1980.


Collaborative Colleagues:
Kevin L. Chang: colleagues
Ravi Kannan: colleagues