ACM Home Page
Please provide us with feedback. Feedback
Estimating simple functions on the union of data streams
Full text PdfPdf (1.38 MB)
Source ACM Symposium on Parallel Algorithms and Architectures archive
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures table of contents
Crete Island, Greece
Pages: 281 - 291  
Year of Publication: 2001
ISBN:1-58113-409-6
Authors
Phillip B. Gibbons  Information Sciences Research Center, Bell Laboratories, Murray Hill, NJ
Srikanta Tirthapura  Computer Science Department, Brown University, Providence, RI
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 30,   Citation Count: 34
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/378580.378687
What is a DOI?

ABSTRACT

Massive data sets often arise as physically distributed, parallel data streams. We present algorithms for estimating simple functions on the union of such data streams, while using only logarithmic space per stream. Each processor observes only its own stream, and communicates with the other processors only after observing its entire stream. This models the set-up in current network monitoring products. Our algorithms employ a novel coordinated sampling technique to extract a sample of the union; this sample can be used to estimate aggregate functions on the union. The technique can also be used to estimate aggregate functions over the distinct “labels” in one or more data streams, e.g., to determine the zeroth frequency moment (i.e., the number of distinct labels) in one or more data streams. Our space and time bounds are the best known for these problems, and our logarithmic space bounds for coordinated sampling contrast with polynomial lower bounds for independent sampling. We relate our distributed streams model to previously studied non-distributed (i.e., merged) streams models, presenting tight bounds on the gap between the distributed and merged models for deterministic algorithms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
6
 
7
 
8
 
9
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. Testing and spot-checking of data streams. Technical report, AT&T Shannon Laboratories, Florham Park, N J, July 1999.
 
10
 
11
12
 
13
 
14
 
15
 
16
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical report, Digital Systems Research Center, Palo Alto, CA, May 1998.
 
17
P. Indyk. A small approximately min-wise independent family of hash functions. Technical report, Stanford University, Palo Alto, CA, Nov. 1998.
 
18
 
19
 
20
 
21
 
22
23
 
24
N. Nisan and D. Ron. Private communication, October-November 2000.
 
25
Transaction processing performance council (TPC). TPC Benchmarks, 2000. URL: www. tpc. org.
26

CITED BY  34
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Collaborative Colleagues:
Phillip B. Gibbons: colleagues
Srikanta Tirthapura: colleagues

Peer to Peer - Readers of this Article have also read: