|
ABSTRACT
We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε-2 log(δ-1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε-2 / log(ε-1)), meaning that our algorithm is near-optimal in terms of its dependency on ε. This improves over previous work on this problem [8, 13, 17, 5]. We show that generalizing to kth order entropy requires close to linear space for all k ≥ 1, and give additive approximations using our algorithm. Lastly, we show how to compute a multiplicative approximation to the entropy of a random walk on an undirected graph.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
Ziv Bar-Yossef , Ravi Kumar , D. Sivakumar, Reductions in streaming algorithms, with an application to counting triangles in graphs, Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.623-632, January 06-08, 2002, San Francisco, California
|
| |
5
|
|
 |
6
|
Lakshminath Bhuvanagiri , Sumit Ganguly , Deepanjan Kesh , Chandan Saha, Simpler algorithm for estimating frequency moments of data streams, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.708-713, January 22-26, 2006, Miami, Florida
[doi> 10.1145/1109557.1109634]
|
| |
7
|
P. Bose, E. Kranakis, P. Morin, and Y. Tang, Bounds for frequency estimation of packet streams, in SIROCCO, 2003.
|
| |
8
|
A. Chakrabarti, K. Do Ba, and S. Muthukrishnan, Estimating entropy and entropy norm on data streams., in STACS, 2006, pp. 196--205.
|
| |
9
|
A. Chakrabarti, S. Khot, and X. Sun, Near-optimal lower bounds on the multi-party communication complexity of set disjointness, in CCC, 2003, pp. 107--117.
|
 |
10
|
|
| |
11
|
|
| |
12
|
Y. Gu, A. McCallum, and D. Towsley, Detecting anomalies in network traffic using maximum entropy estimation., in Proc. Internet Measurement Conference, 2005.
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
J. Misra and D. Gries, Finding repeated elements, Science of Computer Programming, 2 (1982), pp. 143--152.
|
| |
19
|
N. Nisan, Pseudorandom generators for space-bounded computation, Combinatorica, 12 (1992), pp. 449--461.
|
| |
20
|
|
| |
21
|
|
 |
22
|
Kuai Xu , Zhi-Li Zhang , Supratik Bhattacharyya, Profiling internet backbone traffic: behavior models and applications, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, August 22-26, 2005, Philadelphia, Pennsylvania, USA
|
CITED BY 9
|
|
Haiquan (Chuck) Zhao , Ashwin Lall , Mitsunori Ogihara , Oliver Spatscheck , Jia Wang , Jun Xu, A data streaming algorithm for estimating entropies of od flows, Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, October 24-26, 2007, San Diego, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Radu Berinde , Graham Cormode , Piotr Indyk , Martin J. Strauss, Space-optimal heavy hitters with strong error bounds, Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 29-July 01, 2009, Providence, Rhode Island, USA
|
|