|
ABSTRACT
We propose Compressed Counting (CC) for approximating the αth frequency moments (0 < α ≤ 2) of data streams under a relaxed strict-Turnstile model, using maximally-skewed stable random projections. Estimators based on the geometric mean and the harmonic mean are developed. When α = 1, a simple counter suffices for counting the first moment (i.e., sum). The geometric mean estimator of CC has asymptotic variance α Δ = |α - 1|, capturing the intuition that the complexity should decrease as Δ = |α - 1| → 0. However, the previous classical algorithms based on symmetric stable random projections[12, 15] required O (1/ε2) space, in order to approximate the αth moments within a 1 + ε factor, for any 0 < α ≤ 2 including α = 1. We show that using the geometric mean estimator, CC requires O [EQUATION] space, as Δ → 0. Therefore, in the neighborhood of α = 1, the complexity of CC is essentially O (1/ε) instead of O (1/ε2). CC may be useful for estimating Shannon entropy, which can be approximated by certain functions of the αth moments with α → 1. [10, 9] suggested using α = 1 + Δ with (e.g.,) Δ < 0.0001 and ε < 10−7, to rigorously ensure reasonable approximations. Thus, unfortunately, CC is "theoretically impractical" for estimating Shannon entropy, despite its empirical success reported in [16].
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
4
|
|
| |
5
|
J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. JASA, 71(354):340--344, 1976.
|
| |
6
|
|
| |
7
|
|
| |
8
|
I. S. Gradshteyn and I. M. Ryzhik. Table of Integrals, Series, and Products. Academic Press, New York, 2000.
|
| |
9
|
|
| |
10
|
N. J. A. Harvey, J. Nelson, and K. Onak. Streaming algorithms for estimating entropy. In ITW'08.
|
| |
11
|
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on Data Streams. American Mathematical Society, 1999.
|
 |
12
|
|
 |
13
|
|
| |
14
|
C. D. Hardin Jr. Skewed stable variables and processes. Technical Report 79, University of North Carolina, 1984.
|
| |
15
|
|
| |
16
|
P. Li. A very efficient scheme for estimating entropy of data streams using compressed counting. arxiv.org/PS_cache/arxiv/pdf/0808/0808.1771v2.pdf, 2008.
|
| |
17
|
P. Li, D. Paul, R. Narasimhan, and J. Cioffi. On the distribution of SINR for the MMSE MIMO receiver and performance analysis. IEEE Trans. Inform. Theory, 52(1):271--286, 2006.
|
| |
18
|
|
| |
19
|
A. Rényi. On measures of information and entropy. In The 4th Berkeley Symposium on Mathematics, Statistics and Probability 1960, 547--561, 1961.
|
 |
20
|
|
| |
21
|
C. Tsallis. Possible generalization of boltzmann-gibbs statistics. Journal of Statistical Physics, 52:479--487, 1988.
|
| |
22
|
|
 |
23
|
Haiquan (Chuck) Zhao , Ashwin Lall , Mitsunori Ogihara , Oliver Spatscheck , Jia Wang , Jun Xu, A data streaming algorithm for estimating entropies of od flows, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, October 24-26, 2007, San Diego, California, USA
[doi> 10.1145/1298306.1298345]
|
| |
24
|
V. M. Zolotarev. One-dimensional Stable Distributions. American Mathematical Society, 1986.
|
|