ACM Home Page
Please provide us with feedback. Feedback
Estimating statistical aggregates on probabilistic data streams
Full text PdfPdf (228 KB)
Source
ACM Transactions on Database Systems (TODS) archive
Volume 33 ,  Issue 4  (November 2008) table of contents
Article No. 26  
Year of Publication: 2008
ISSN:0362-5915
Authors
T. S. Jayram  IBM Almaden Research, Almaden, CA
Andrew McGregor  University of Massachusetts, Amherst, Amherst, MA
S. Muthukrishnan  Google, Inc., New York, NY
Erik Vee  Yahoo! Research, Sunnyvale, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 178,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1412331.1412338
What is a DOI?

ABSTRACT

The probabilistic stream model was introduced by Jayram et al. [2007]. It is a generalization of the data stream model that is suited to handling probabilistic data, where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over a potentially exponential number of classical deterministic streams, where each item is deterministically one of the domain values.

We present algorithms for computing commonly used aggregates on a probabilistic stream. We present the first one pass streaming algorithms for estimating the expected mean of a probabilistic stream. Next, we consider the problem of estimating frequency moments for probabilistic data. We propose a general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams. Applying this approach, we extend a classical data stream algorithm to obtain a one-pass algorithm for estimating F2, the second frequency moment. We present the first known streaming algorithms for estimating F0, the number of distinct items on probabilistic streams. Our work also gives an efficient one-pass algorithm for estimating the median, and a two-pass algorithm for estimating the range.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
 
6
 
7
8
9
10
 
11
 
12
 
13
 
14
15
16
 
17
Guha, S. and McGregor, A. 2007. Space-efficient sampling. In AISTATS. 169--176.
18
 
19
 
20
Kannan, S. and McGregor, A. 2005. More on reconstructing strings from random traces: Insertions and deletions. In IEEE International Symposium on Information Theory. 297--301.
 
21
Lipson, J. D. 1981. Elements of Algebra and Algebraic Computing. Addison-Wesley Publishing Company.
 
22
Munro, J. I. and Paterson, M. 1980. Selection and sorting with limited storage. Theor. Comput. Sci. 12, 315--323.
 
23
Muthukrishnan, S. 2006. Data Streams: Algorithms and Applications. Now Publishers.

Collaborative Colleagues:
T. S. Jayram: colleagues
Andrew McGregor: colleagues
S. Muthukrishnan: colleagues
Erik Vee: colleagues