| Tight results for clustering and summarizing data streams |
| Full text |
Pdf
(443 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 361
archive
Proceedings of the 12th International Conference on Database Theory
table of contents
St. Petersburg, Russia
SESSION: Streams, data mining, complexity
table of contents
Pages 268-275
Year of Publication: 2009
ISBN:978-1-60558-423-2
|
|
Author
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 78, Citation Count: 0
|
|
|
ABSTRACT
In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of "streamstrapping" where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Swarup Acharya , Phillip B. Gibbons , Viswanath Poosala , Sridhar Ramaswamy, The Aqua approximate query answering system, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.574-576, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
2
|
Vijay Arya , Naveen Garg , Rohit Khandekar , Adam Meyerson , Kamesh Munagala , Vinayaka Pandit, Local search heuristic for k-median and facility location problems, Proceedings of the thirty-third annual ACM symposium on Theory of computing, p.21-29, July 2001, Hersonissos, Greece
[doi> 10.1145/380752.380755]
|
 |
3
|
|
| |
4
|
C. Buragohain, N. Shrivastava, and S. Suri. Space efficient streaming algorithms for the maximum error histogram. Proc of. ICDE, pages 1026--1035, 2007.
|
 |
5
|
|
 |
6
|
Moses Charikar , Chandra Chekuri , Tomás Feder , Rajeev Motwani, Incremental clustering and dynamic information retrieval, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.626-635, May 04-06, 1997, El Paso, Texas, United States
[doi> 10.1145/258533.258657]
|
 |
7
|
|
 |
8
|
Julia Chuzhoy , Sudipto Guha , Eran Halperin , Sanjeev Khanna , Guy Kortsarz , Robert Krauthgamer , Joseph (Seffi) Naor, Asymmetric k-center is log* n-hard to approximate, Journal of the ACM (JACM), v.52 n.4, p.538-551, July 2005
[doi> 10.1145/1082036.1082038]
|
 |
9
|
|
| |
10
|
|
| |
11
|
S. Guha and B. Harb. Approximation algorithms for wavelet transform coding of data streams. IEEE Trans. of Info. Theory, 2008.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
|