ACM Home Page
Please provide us with feedback. Feedback
Declaring independence via the sketching of sketches
Full text PdfPdf (368 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
San Francisco, California
Pages 737-745  
Year of Publication: 2008
Authors
Sponsors
: SIAM Activity Group on Discrete Mathematics
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 66,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We consider the problem of identifying correlations in data streams. Surprisingly, our work seems to be the first to consider this natural problem. In the centralized model, we consider a stream of pairs (i,j) ∈ [n]2 whose frequencies define a joint distribution (X,Y). In the distributed model, each coordinate of the pair may appear separately in the stream. We present a range of algorithms for approximating to what extent X and Y are independent, i.e., how close the joint distribution is to the product of the marginals. We consider various measures of closeness including ℓ1, ℓ2, and the mutual information between X and Y. Our algorithms are based on "sketching sketches", i.e., composing small-space linear synopses of the distributions. Perhaps ironically, the biggest technical challenges that arise relate to ensuring that different components of our estimates are sufficiently independent.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
 
6
7
 
8
 
9
A. Chakrabarti, S. Khot, and X. Sun, Near-optimal lower bounds on the multi-party communication complexity of set disjointness., in IEEE Conference on Computational Complexity, 2003, pp. 107--117.
10
 
11
12
 
13
14
 
15
 
16
 
17
18
19
 
20
S. Guha, P. Indyk, and A. McGregor, Sketching information divergences, in Conference on Learning Theory, 2007, pp. 424--438.
21
 
22
S. Guha and A. McGregor, Lower bounds for quantile estimation in random-order and multipass streaming, in International Colloquium on Automata, Languages and Programming, 2007, pp. 704--715.
 
23
S. Guha and A. McGregor, Space-efficient sampling, in AISTATS, 2007, pp. 169--176.
24
25
 
26
T. S. Jayram, R. Kumar, and D. Sivakumar, Simple lower bound on one-way Gap-Hamming., in www.cse.iitk.ac.in/users/sganguly/slides/ravikumar.pdf, 2007.
 
27
 
28
 
29


Collaborative Colleagues:
Piotr Indyk: colleagues
Andrew McGregor: colleagues