|
ABSTRACT
Recently, there has been an increased focus on modeling uncertainty by distributions. Suppose we wish to compute a function of a stream whose elements are samples drawn independently from some distribution. The distribution is unknown, but the order in which the samples are presented to us will not be completely adversarial. In this paper, we investigate the importance of the ordering of a data stream, without making any assumptions about the actual distribution of the data. Using quantiles as an example application, we show that we can design provably better algorithms, and settle several open questions on the impact of order on streams. With the recent impetus in the investigation of models for sensor networks, we believe that our approach will allow the construction of novel and significantly improved algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Daniel J. Abadi , Don Carney , Ugur Çetintemel , Mitch Cherniack , Christian Convey , Sangdon Lee , Michael Stonebraker , Nesime Tatbul , Stan Zdonik, Aurora: a new model and architecture for data stream management, The VLDB Journal — The International Journal on Very Large Data Bases, v.12 n.2, p.120-139, August 2003
[doi> 10.1007/s00778-003-0095-z]
|
| |
2
|
A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom. Stream: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1):19--26, 2003.
|
 |
3
|
|
| |
4
|
|
 |
5
|
Surajit Chaudhuri , Rajeev Motwani , Vivek Narasayya, Random sampling for histogram construction: how much is enough?, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.436-447, June 01-04, 1998, Seattle, Washington, United States
|
| |
6
|
|
| |
7
|
A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. Proc. of VLDB Conference (Best Paper), pages 588--599, 2004.
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In Proc. 28th International Conference on Very Large Data Bases, pages 454--465, 2002.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998-001, DEC Systems Research Center, 1998.
|
 |
18
|
|
| |
19
|
S. Kannan. Open problems in streaming. PPT Slides, (request source).
|
 |
20
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
21
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
 |
22
|
|
| |
23
|
J. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12:315--323, 1980.
|
| |
24
|
S. Muthukrishnan. Data streams: Algorithms and applications. Survey available on request at muthu@research.att.com, 2003.
|
|