| Sampling from a moving window over streaming data |
| Full text |
Pdf
(213 KB)
|
| Source
|
Symposium on Discrete Algorithms
archive
Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
table of contents
San Francisco, California
Pages: 633 - 634
Year of Publication: 2002
ISBN:0-89871-513-X
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
Society for Industrial and Applied Mathematics
Philadelphia, PA, USA
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 120, Citation Count: 43
|
|
|
ABSTRACT
We introduce the problem of sampling from a moving window of recent items from a data stream and develop two algorithms for this problem. The first algorithm, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample. The expected memory usage of our algorithm is O(k) when maintaining a sample of size k over a window of the n most recent elements from the data stream, and with high probability the algorithm requires no more than O(k log n) memory.When the number of elements in the window is variable, as is the case when the size of the window is defined as a time duration rather than as a fixed number of data elements, the sampling problem becomes harder. Our second algorithm, "priority-sample", works even when the number of elements in the window can vary dynamically over time. With high probability, the "priority-sample" algorithm uses no more than O(k log n) memory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. R. Aragon and R. G. Seidel, Randomized Search Trees, Proc. of the 30th IEEE Symp. on Foundations of Computer Science, 1989, pp. 540-545.
|
 |
2
|
|
| |
3
|
K. Mulmuley, Computational Geometry: An Introduction through Randomized Algorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1994.
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
CITED BY 43
|
|
Brain Babcock , Mayur Datar , Rajeev Motwani , Liadan O'Callaghan, Maintaining variance and k-medians over data stream windows, Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.234-243, June 09-11, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arvind Arasu , Brian Babcock , Shivnath Babu , Jon McAlister , Jennifer Widom, Characterizing memory requirements for queries over continuous data streams, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Subramaniam , T. Palpanas , D. Papadopoulos , V. Kalogeraki , D. Gunopulos, Online outlier detection in sensor data using non-parametric models, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
Yijian Bai , Hetal Thakkar , Haixun Wang , Chang Luo , Carlo Zaniolo, A data stream language and system designed for power and extensibility, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
Zhang Longbo , Li Zhanhuai , Zhao Yiqiang , Yu Min , Zhang Yang, A priority random sampling algorithm for time-based sliding windows over weighted streaming data, Proceedings of the 2007 ACM symposium on Applied computing, March 11-15, 2007, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bin Zhou , Yi Han , Jian Pei , Bin Jiang , Yufei Tao , Yan Jia, Continuous privacy preserving publishing of data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
Edward Bortnikov , Maxim Gurevich , Idit Keidar , Gabriel Kliot , Alexander Shraer, Brahms: Byzantine resilient random membership sampling, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.53 n.13, p.2340-2359, August, 2009
|
|
|
|
|