ACM Home Page
Please provide us with feedback. Feedback
Sampling from a moving window over streaming data
Full text PdfPdf (213 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
San Francisco, California
Pages: 633 - 634  
Year of Publication: 2002
ISBN:0-89871-513-X
Authors
Brian Babcock  Stanford University, CA
Mayur Datar  Stanford University, CA
Rajeev Motwani  Stanford University, CA
Sponsor
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 120,   Citation Count: 43
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We introduce the problem of sampling from a moving window of recent items from a data stream and develop two algorithms for this problem. The first algorithm, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample. The expected memory usage of our algorithm is O(k) when maintaining a sample of size k over a window of the n most recent elements from the data stream, and with high probability the algorithm requires no more than O(k log n) memory.When the number of elements in the window is variable, as is the case when the size of the window is defined as a time duration rather than as a fixed number of data elements, the sampling problem becomes harder. Our second algorithm, "priority-sample", works even when the number of elements in the window can vary dynamically over time. With high probability, the "priority-sample" algorithm uses no more than O(k log n) memory.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. R. Aragon and R. G. Seidel, Randomized Search Trees, Proc. of the 30th IEEE Symp. on Foundations of Computer Science, 1989, pp. 540-545.
2
 
3
K. Mulmuley, Computational Geometry: An Introduction through Randomized Algorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1994.
4
 
5
 
6

CITED BY  43
Collaborative Colleagues:
Brian Babcock: colleagues
Mayur Datar: colleagues
Rajeev Motwani: colleagues