ACM Home Page
Please provide us with feedback. Feedback
Tight results for clustering and summarizing data streams
Full text PdfPdf (443 KB)
Source ACM International Conference Proceeding Series; Vol. 361 archive
Proceedings of the 12th International Conference on Database Theory table of contents
St. Petersburg, Russia
SESSION: Streams, data mining, complexity table of contents
Pages 268-275  
Year of Publication: 2009
ISBN:978-1-60558-423-2
Author
Sudipto Guha  University of Pennsylvania, Philadelphia, PA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 78,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1514894.1514926
What is a DOI?

ABSTRACT

In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of "streamstrapping" where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
C. Buragohain, N. Shrivastava, and S. Suri. Space efficient streaming algorithms for the maximum error histogram. Proc of. ICDE, pages 1026--1035, 2007.
5
6
7
8
9
 
10
 
11
S. Guha and B. Harb. Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. of Info. Theory, 2008.
12
13
 
14
 
15
 
16
17
 
18
 
19
20
 
21
 
22
 
23