ACM Home Page
Please provide us with feedback. Feedback
A near-optimal algorithm for computing the entropy of a stream
Full text PdfPdf (442 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
New Orleans, Louisiana
Pages: 328 - 335  
Year of Publication: 2007
ISBN:978-0-898716-24-5
Authors
Amit Chakrabarti  Dartmouth College
Graham Cormode  Lucent Bell Laboratories
Andrew McGregor  ONR
Sponsors
: SIAM Activity Group on Discrete Mathematics
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 86,   Citation Count: 9
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O-2 log(δ-1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε-2 / log(ε-1)), meaning that our algorithm is near-optimal in terms of its dependency on ε. This improves over previous work on this problem [8, 13, 17, 5]. We show that generalizing to kth order entropy requires close to linear space for all k ≥ 1, and give additive approximations using our algorithm. Lastly, we show how to compute a multiplicative approximation to the entropy of a random walk on an undirected graph.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
6
 
7
P. Bose, E. Kranakis, P. Morin, and Y. Tang, Bounds for frequency estimation of packet streams, in SIROCCO, 2003.
 
8
A. Chakrabarti, K. Do Ba, and S. Muthukrishnan, Estimating entropy and entropy norm on data streams., in STACS, 2006, pp. 196--205.
 
9
A. Chakrabarti, S. Khot, and X. Sun, Near-optimal lower bounds on the multi-party communication complexity of set disjointness, in CCC, 2003, pp. 107--117.
10
 
11
 
12
Y. Gu, A. McCallum, and D. Towsley, Detecting anomalies in network traffic using maximum entropy estimation., in Proc. Internet Measurement Conference, 2005.
13
 
14
 
15
16
17
 
18
J. Misra and D. Gries, Finding repeated elements, Science of Computer Programming, 2 (1982), pp. 143--152.
 
19
N. Nisan, Pseudorandom generators for space-bounded computation, Combinatorica, 12 (1992), pp. 449--461.
 
20
 
21
22

CITED BY  9
Collaborative Colleagues:
Amit Chakrabarti: colleagues
Graham Cormode: colleagues
Andrew McGregor: colleagues