ACM Home Page
Please provide us with feedback. Feedback
Fast, small-space algorithms for approximate histogram maintenance
Full text PdfPdf (266 KB)
Source Annual ACM Symposium on Theory of Computing archive
Proceedings of the thiry-fourth annual ACM symposium on Theory of computing table of contents
Montreal, Quebec, Canada
SESSION: Session 7A table of contents
Pages: 389 - 398  
Year of Publication: 2002
ISBN:1-58113-495-9
Authors
Anna C. Gilbert  AT&T Labs---Research
Sudipto Guha  University of Pennsylvania
Piotr Indyk  MIT
Yannis Kotidis  AT&T Labs---Research
S. Muthukrishnan  AT&T Labs---Research
Martin J. Strauss  AT&T Labs---Research
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 83,   Citation Count: 55
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/509907.509966
What is a DOI?

ABSTRACT

(MATH) A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ||A—H||&xie;(1+&egr;)||A—Hopt||, where the error ||A—H|| is either $\ell_1$ (absolute) or $\ell_2$ (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B,log(N),log||A,1/&egr;. Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accruacy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under $\ell_2$ error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
 
6
 
7
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss. QuickSAND: Quick Summary and Analysis of Network Data DIMACS Technical Report 2001-43.
 
8
S. Guha, N. Koudas. Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. ICDE 2002.
9
 
10
 
11
12
 
13
 
14
M. Naor, O. Reingold. Private communication, March, 1999.
15
 
16
V. Poosala. Histograms for selecitivty estimation. PhD Thesis, U. Wisconsin, Madison. 1997.
17

CITED BY  55

Collaborative Colleagues:
Anna C. Gilbert: colleagues
Sudipto Guha: colleagues
Piotr Indyk: colleagues
Yannis Kotidis: colleagues
S. Muthukrishnan: colleagues
Martin J. Strauss: colleagues