ACM Home Page
Please provide us with feedback. Feedback
Maintaining variance and k-medians over data stream windows
Full text PdfPdf (253 KB)
Source Symposium on Principles of Database Systems archive
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
San Diego, California
Pages: 234 - 243  
Year of Publication: 2003
ISBN:1-58113-670-6
Authors
Brain Babcock  Stanford University, Stanford, CA
Mayur Datar  Stanford University, Stanford, CA
Rajeev Motwani  Stanford University, Stanford, CA
Liadan O'Callaghan  Stanford University, Stanford, CA
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 92,   Citation Count: 32
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/773153.773176
What is a DOI?

ABSTRACT

The sliding window model is useful for discounting stale data in data stream applications. In this model, data elements arrive continually and only the most recent N elements are used when answering queries. We present a novel technique for solving two important and related problems in the sliding window model---maintaining variance and maintaining a k--median clustering. Our solution to the problem of maintaining variance provides a continually updated estimate of the variance of the last N values in a data stream with relative error of at most ε using O(1/ε2 log N) memory. We present a constant-factor approximation algorithm which maintains an approximate k--median solution for the last N data points using O(k/τ4 N log2 N) memory, where τ < 1/2 is a parameter which trades off the space bound with the approximation factor of O(2O(1/τ)).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
P. S. Bradley, U. M. Fayyad, and C. Reina. "Scaling clustering algorithms to large databases." In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD), 1998, pages 9--15.
 
5
6
 
7
8
9
10
 
11
 
12
13
 
14
M.R. Henzinger, P. Raghavan, and S. Rajagopalan "Computing on Data Streams.", Technical Report 1998--011, Compaq Systems Research Center, Palo Alto, CA, May, 1998.
15
 
16
 
17
 
18
 
19
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. "Streaming-Data Algorithms for High-Quality Clustering." In Proceedings of the 18th Annual IEEE International Conference on Data Engineering (ICDE), 2001.
20
21
22

CITED BY  32

Collaborative Colleagues:
Brain Babcock: colleagues
Mayur Datar: colleagues
Rajeev Motwani: colleagues
Liadan O'Callaghan: colleagues