|
ABSTRACT
The probabilistic-stream model was introduced by Jayram et al. [20].It is a generalization of the data stream model that issuited to handling "probabilistic" data, where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over apotentially exponential number of classical "deterministic" streams where each item is deterministically one of the domain values. Designing efficient aggregation algorithms for probabilistic data is crucial for handling uncertainty in data-centric applications such as OLAP. Such algorithms are also useful in a variety of other setting including analyzing search engine traffic and aggregation in sensor networks. We present algorithms for computing commonly used aggregates ona probabilistic stream. We present the first one pass streaming algorithms for estimating the expected mean of a probabilistic stream, improving upon results in [20]. Next, we consider the problem of estimating frequency moments for probabilistic data. We propose a general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams. Applying this approach, we extend a classical data stream algorithm to obtain a one-pass algorithm for estimating F2, the second frequency moment. We present the first known streaming algorithms forestimating F0, the number of distinct items on probabilistic streams.Our work also gives an efficient one-pass algorithm for estimatingthe median of a probabilistic stream.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Doug Burdick , Prasad M. Deshpande , T. S. Jayram , Raghu Ramakrishnan , Shivakumar Vaithyanathan, OLAP over uncertain and imprecise data, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
7
|
Doug Burdick , Prasad M. Deshpande , T. S. Jayram , Raghu Ramakrishnan , Shivakumar Vaithyanathan, OLAP over uncertain and imprecise data, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
8
|
Doug Burdick , Prasad M. Deshpande , T. S. Jayram , Raghu Ramakrishnan , Shivakumar Vaithyanathan, Efficient allocation algorithms for OLAP over imprecise data, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In International Conference on Very Large Data Bases (VLDB), pages 864--875, 2004.
|
| |
13
|
Joan Feigenbaum , Sampath Kannan , Andrew McGregor , Siddharth Suri , Jian Zhang, Graph distances in the streaming model: the value of space, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, January 23-25, 2005, Vancouver, British Columbia
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
S. Guha and A. McGregor. Space-Efficient Sampling. In AISTATS, pages 169--176, 2007.
|
 |
19
|
|
| |
20
|
T. Jayram, S. Kale, and E. Vee. Efficient aggregation algorithms for probabilistic data. In ACM-SIAM Symposium on Discrete Algorithms, 2007.
|
| |
21
|
S. Kannan and A. McGregor. More on reconstructing strings from random traces: Insertions and deletions. In IEEE International Symposium on Information Theory, pages 297--301, 2005.
|
| |
22
|
J. I. Munro and M. Paterson. Selection and sorting with limited storage. Theor. Comput. Sci., 12:315--323, 1980.
|
| |
23
|
S. Muthukrishnan. Data streams: Algorithms and applications. Now Publishers, 2006.
|
CITED BY 9
|
|
|
|
|
Evan Welbourne , Nodira Khoussainova , Julie Letchner , Yang Li , Magdalena Balazinska , Gaetano Borriello , Dan Suciu, Cascadia: A System for Specifying, Detecting, and Managing RFID Events, Proceeding of the 6th international conference on Mobile systems, applications, and services, June 17-20, 2008, Breckenridge, CO, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|