|
ABSTRACT
We consider the problem of maintaining ε-approximate counts and quantiles over a stream sliding window using limited space. We consider two types of sliding windows depending on whether the number of elements N in the window is fixed (fixed-size sliding window) or variable (variable-size sliding window). In a fixed-size sliding window, both the ends of the window slide synchronously over the stream. In a variable-size sliding window, an adversary slides the window ends independently, and therefore has the ability to vary the number of elements N in the window.We present various deterministic and randomized algorithms for approximate counts and quantiles. All of our algorithms require O(1/ε polylog(1/ε, N)) space. For quantiles, this space requirement is an improvement over the previous best bound of O(1/ε2 polylog(1/ε, N)). We believe that no previous work on space-efficient approximate counts over sliding windows exists.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
 |
2
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
3
|
|
 |
4
|
Brain Babcock , Mayur Datar , Rajeev Motwani , Liadan O'Callaghan, Maintaining variance and k-medians over data stream windows, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.234-243, June 09-11, 2003, San Diego, California
[doi> 10.1145/773153.773176]
|
| |
5
|
M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds for selection. Journal of Computer and System Sciences, 7(4):448--461, Aug. 1973.
|
| |
6
|
|
 |
7
|
|
| |
8
|
G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. In LATIN 2004, pages 29--38, Apr. 2004.
|
| |
9
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. of the 28th Intl. Conf. on Very Large Data Bases, pages 356--357, Aug. 2002.
|
 |
19
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
 |
20
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Random sampling techniques for space efficient online computation of order statistics of large datasets, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.251-262, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
21
|
J. Misra and D. Gries. Finding repeated elements. Sci. Comput. Programming, 2(2):143--152, Nov. 1982.
|
| |
22
|
R. Motwani, J. Widom, et al. Query processing, approximation, and resource management in a data stream management system. In Proc. of the 1st Conf. on Innovative Data Systems Research, pages 245--256, Jan. 2003.
|
| |
23
|
J. I. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, pages 315--323, 1980.
|
| |
24
|
|
| |
25
|
I. Pohl. A minimum storage algorithm for computing the median. Technical Report IBM Research Report RC 2701 (# 12713), IBM T. J. Watson Center, 1969.
|
| |
26
|
|
 |
27
|
|
CITED BY 23
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qi (George) Zhao , Mitsunori Ogihara , Haixun Wang , Jun (Jim) Xu, Finding global icebergs over distributed data sets, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
|
|
|
|
|