ACM Home Page
Please provide us with feedback. Feedback
Compressed counting
Full text PdfPdf (668 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 412-421  
Year of Publication: 2009
Author
Ping Li  Cornell University, Ithaca NY
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 64,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We propose Compressed Counting (CC) for approximating the αth frequency moments (0 < α ≤ 2) of data streams under a relaxed strict-Turnstile model, using maximally-skewed stable random projections. Estimators based on the geometric mean and the harmonic mean are developed.

When α = 1, a simple counter suffices for counting the first moment (i.e., sum). The geometric mean estimator of CC has asymptotic variance α Δ = |α - 1|, capturing the intuition that the complexity should decrease as Δ = |α - 1| → 0. However, the previous classical algorithms based on symmetric stable random projections[12, 15] required O (1/ε2) space, in order to approximate the αth moments within a 1 + ε factor, for any 0 < α ≤ 2 including α = 1.

We show that using the geometric mean estimator, CC requires O [EQUATION] space, as Δ → 0. Therefore, in the neighborhood of α = 1, the complexity of CC is essentially O (1/ε) instead of O (1/ε2).

CC may be useful for estimating Shannon entropy, which can be approximated by certain functions of the αth moments with α → 1. [10, 9] suggested using α = 1 + Δ with (e.g.,) Δ < 0.0001 and ε < 10−7, to rigorously ensure reasonable approximations. Thus, unfortunately, CC is "theoretically impractical" for estimating Shannon entropy, despite its empirical success reported in [16].


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. JASA, 71(354):340--344, 1976.
 
6
 
7
 
8
I. S. Gradshteyn and I. M. Ryzhik. Table of Integrals, Series, and Products. Academic Press, New York, 2000.
 
9
 
10
N. J. A. Harvey, J. Nelson, and K. Onak. Streaming algorithms for estimating entropy. In ITW'08.
 
11
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on Data Streams. American Mathematical Society, 1999.
12
13
 
14
C. D. Hardin Jr. Skewed stable variables and processes. Technical Report 79, University of North Carolina, 1984.
 
15
 
16
P. Li. A very efficient scheme for estimating entropy of data streams using compressed counting. arxiv.org/PS_cache/arxiv/pdf/0808/0808.1771v2.pdf, 2008.
 
17
P. Li, D. Paul, R. Narasimhan, and J. Cioffi. On the distribution of SINR for the MMSE MIMO receiver and performance analysis. IEEE Trans. Inform. Theory, 52(1):271--286, 2006.
 
18
 
19
A. Rényi. On measures of information and entropy. In The 4th Berkeley Symposium on Mathematics, Statistics and Probability 1960, 547--561, 1961.
20
 
21
C. Tsallis. Possible generalization of boltzmann-gibbs statistics. Journal of Statistical Physics, 52:479--487, 1988.
 
22
23
 
24
V. M. Zolotarev. One-dimensional Stable Distributions. American Mathematical Society, 1986.