| Maintaining variance and k-medians over data stream windows |
| Full text |
Pdf
(253 KB)
|
| Source
|
Symposium on Principles of Database Systems
archive
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
table of contents
San Diego, California
Pages: 234 - 243
Year of Publication: 2003
ISBN:1-58113-670-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 96, Citation Count: 32
|
|
|
ABSTRACT
The sliding window model is useful for discounting stale data in data stream applications. In this model, data elements arrive continually and only the most recent N elements are used when answering queries. We present a novel technique for solving two important and related problems in the sliding window model---maintaining variance and maintaining a k--median clustering. Our solution to the problem of maintaining variance provides a continually updated estimate of the variance of the last N values in a data stream with relative error of at most ε using O(1/ε2 log N) memory. We present a constant-factor approximation algorithm which maintains an approximate k--median solution for the last N data points using O(k/τ4 N2τ log2 N) memory, where τ < 1/2 is a parameter which trades off the space bound with the approximation factor of O(2O(1/τ)).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vijay Arya , Naveen Garg , Rohit Khandekar , Kamesh Munagala , Vinayaka Pandit, Local search heuristic for k-median and facility location problems, Proceedings of the thirty-third annual ACM symposium on Theory of computing, p.21-29, July 2001, Hersonissos, Greece
[doi> 10.1145/380752.380755]
|
 |
2
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
| |
3
|
|
| |
4
|
P. S. Bradley, U. M. Fayyad, and C. Reina. "Scaling clustering algorithms to large databases." In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD), 1998, pages 9--15.
|
| |
5
|
|
 |
6
|
|
| |
7
|
Mayur Datar , Aristides Gionis , Piotr Indyk , Rajeev Motwani, Maintaining stream statistics over sliding windows: (extended abstract), Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, p.635-644, January 06-08, 2002, San Francisco, California
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
 |
13
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
| |
14
|
M.R. Henzinger, P. Raghavan, and S. Rajagopalan "Computing on Data Streams.", Technical Report 1998--011, Compaq Systems Research Center, Palo Alto, CA, May, 1998.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Nina Mishra , Dan Oblinger , Leonard Pitt, Sublinear time approximate clustering, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, p.439-447, January 07-09, 2001, Washington, D.C., United States
|
| |
19
|
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. "Streaming-Data Algorithms for High-Quality Clustering." In Proceedings of the 18th Annual IEEE International Conference on Data Engineering (ICDE), 2001.
|
 |
20
|
|
 |
21
|
|
 |
22
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 32
|
|
|
|
|
|
|
|
Mohamed Medhat Gaber , Shonali Krishnaswamy , Arkady Zaslavsky, Cost-efficient mining techniques for data streams, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.109-114, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Subramaniam , T. Palpanas , D. Papadopoulos , V. Kalogeraki , D. Gunopulos, Online outlier detection in sensor data using non-parametric models, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lu-An Tang , Bin Gui , Hong-Yan Li , Gao-Shan Miao , Dong-Qing Yang , Xin-Biao Zhou, PGG: an online pattern based approach for stream variation management, Journal of Computer Science and Technology, v.23 n.4, p.497-515, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|